Limit search to available items
Book Cover
E-book
Author Kalita, Jugal Kumar, author.

Title Fundamentals of data science : theory and practice / Jugal K. Kalita, Dhruba K. Bhattacharyya, Swarup Roy
Published London ; San Diego, CA : Academic Press, [2024]

Copies

Description 1 online resource
Contents Front Cover -- Fundamentals of Data Science -- Copyright -- Contents -- Preface -- Acknowledgment -- Foreword -- Foreword -- 1 Introduction -- 1.1 Data, information, and knowledge -- 1.2 Data Science: the art of data exploration -- 1.2.1 Brief history -- 1.2.2 General pipeline -- 1.2.2.1 Data collection and integration -- 1.2.2.2 Data preparation -- 1.2.2.3 Learning-model construction -- 1.2.2.4 Knowledge interpretation and presentation -- 1.2.3 Multidisciplinary science -- 1.3 What is not Data Science? -- 1.4 Data Science tasks -- 1.4.1 Predictive Data Science
1.4.2 Descriptive Data Science -- 1.4.3 Diagnostic Data Science -- 1.4.4 Prescriptive Data Science -- 1.5 Data Science objectives -- 1.5.1 Hidden knowledge discovery -- 1.5.2 Prediction of likely outcomes -- 1.5.3 Grouping -- 1.5.4 Actionable information -- 1.6 Applications of Data Science -- 1.7 How to read the book? -- References -- 2 Data, sources, and generation -- 2.1 Introduction -- 2.2 Data attributes -- 2.2.1 Qualitative -- 2.2.1.1 Nominal -- 2.2.1.2 Binary -- 2.2.1.3 Ordinal -- 2.2.2 Quantitative -- 2.2.2.1 Discrete -- 2.2.2.2 Continuous -- 2.2.2.3 Interval -- 2.2.2.4 Ratio
2.3 Data-storage formats -- 2.3.1 Structured data -- 2.3.2 Unstructured data -- 2.3.3 Semistructured data -- 2.4 Data sources -- 2.4.1 Primary sources -- 2.4.2 Secondary sources -- 2.4.3 Popular data sources -- 2.4.4 Homogeneous vs. heterogeneous data sources -- 2.5 Data generation -- 2.5.1 Types of synthetic data -- 2.5.2 Data-generation steps -- 2.5.3 Generation methods -- 2.5.4 Tools for data generation -- 2.5.4.1 Software tools -- 2.5.4.2 Python libraries -- 2.6 Summary -- References -- 3 Data preparation -- 3.1 Introduction -- 3.2 Data cleaning -- 3.2.1 Handling missing values
3.2.1.1 Ignoring and discarding data -- 3.2.1.2 Parameter estimation -- 3.2.1.3 Imputation -- 3.2.2 Duplicate-data detection -- 3.2.2.1 Knowledge-based methods -- 3.2.2.2 ETL method -- 3.3 Data reduction -- 3.3.1 Parametric data reduction -- 3.3.2 Sampling -- 3.3.3 Dimensionality reduction -- 3.4 Data transformation -- 3.4.1 Discretization -- 3.4.1.1 Supervised discretization -- 3.4.1.2 Unsupervised discretization -- 3.5 Data normalization -- 3.5.1 Min-max normalization -- 3.5.2 Z-score normalization -- 3.5.3 Decimal-scaling normalization -- 3.5.4 Quantile normalization
3.5.5 Logarithmic normalization -- 3.6 Data integration -- 3.6.1 Consolidation -- 3.6.2 Federation -- 3.7 Summary -- References -- 4 Machine learning -- 4.1 Introduction -- 4.2 Machine Learning paradigms -- 4.2.1 Supervised learning -- 4.2.2 Unsupervised learning -- 4.2.3 Semisupervised learning -- 4.3 Inductive bias -- 4.4 Evaluating a classifier -- 4.4.1 Evaluation steps -- 4.4.1.1 Validation -- 4.4.1.2 Testing -- 4.4.1.3 K-fold crossvalidation -- 4.4.2 Handling unbalanced classes -- 4.4.3 Model generalization -- 4.4.3.1 Underfitting -- 4.4.3.2 Overfitting -- 4.4.3.3 Accurate fittings
Bibliography Includes bibliographical references and index
Notes Description based on online resource; title from digital title page (viewed on February 20, 2024)
Subject Big data.
Form Electronic book
Author Bhattacharyya, Dhruba K., author.
Roy, Swarup (Computer scientist), author.
ISBN 0323972632
9780323972635