Description |
1 online resource |
Contents |
Front Cover -- Fundamentals of Data Science -- Copyright -- Contents -- Preface -- Acknowledgment -- Foreword -- Foreword -- 1 Introduction -- 1.1 Data, information, and knowledge -- 1.2 Data Science: the art of data exploration -- 1.2.1 Brief history -- 1.2.2 General pipeline -- 1.2.2.1 Data collection and integration -- 1.2.2.2 Data preparation -- 1.2.2.3 Learning-model construction -- 1.2.2.4 Knowledge interpretation and presentation -- 1.2.3 Multidisciplinary science -- 1.3 What is not Data Science? -- 1.4 Data Science tasks -- 1.4.1 Predictive Data Science |
|
1.4.2 Descriptive Data Science -- 1.4.3 Diagnostic Data Science -- 1.4.4 Prescriptive Data Science -- 1.5 Data Science objectives -- 1.5.1 Hidden knowledge discovery -- 1.5.2 Prediction of likely outcomes -- 1.5.3 Grouping -- 1.5.4 Actionable information -- 1.6 Applications of Data Science -- 1.7 How to read the book? -- References -- 2 Data, sources, and generation -- 2.1 Introduction -- 2.2 Data attributes -- 2.2.1 Qualitative -- 2.2.1.1 Nominal -- 2.2.1.2 Binary -- 2.2.1.3 Ordinal -- 2.2.2 Quantitative -- 2.2.2.1 Discrete -- 2.2.2.2 Continuous -- 2.2.2.3 Interval -- 2.2.2.4 Ratio |
|
2.3 Data-storage formats -- 2.3.1 Structured data -- 2.3.2 Unstructured data -- 2.3.3 Semistructured data -- 2.4 Data sources -- 2.4.1 Primary sources -- 2.4.2 Secondary sources -- 2.4.3 Popular data sources -- 2.4.4 Homogeneous vs. heterogeneous data sources -- 2.5 Data generation -- 2.5.1 Types of synthetic data -- 2.5.2 Data-generation steps -- 2.5.3 Generation methods -- 2.5.4 Tools for data generation -- 2.5.4.1 Software tools -- 2.5.4.2 Python libraries -- 2.6 Summary -- References -- 3 Data preparation -- 3.1 Introduction -- 3.2 Data cleaning -- 3.2.1 Handling missing values |
|
3.2.1.1 Ignoring and discarding data -- 3.2.1.2 Parameter estimation -- 3.2.1.3 Imputation -- 3.2.2 Duplicate-data detection -- 3.2.2.1 Knowledge-based methods -- 3.2.2.2 ETL method -- 3.3 Data reduction -- 3.3.1 Parametric data reduction -- 3.3.2 Sampling -- 3.3.3 Dimensionality reduction -- 3.4 Data transformation -- 3.4.1 Discretization -- 3.4.1.1 Supervised discretization -- 3.4.1.2 Unsupervised discretization -- 3.5 Data normalization -- 3.5.1 Min-max normalization -- 3.5.2 Z-score normalization -- 3.5.3 Decimal-scaling normalization -- 3.5.4 Quantile normalization |
|
3.5.5 Logarithmic normalization -- 3.6 Data integration -- 3.6.1 Consolidation -- 3.6.2 Federation -- 3.7 Summary -- References -- 4 Machine learning -- 4.1 Introduction -- 4.2 Machine Learning paradigms -- 4.2.1 Supervised learning -- 4.2.2 Unsupervised learning -- 4.2.3 Semisupervised learning -- 4.3 Inductive bias -- 4.4 Evaluating a classifier -- 4.4.1 Evaluation steps -- 4.4.1.1 Validation -- 4.4.1.2 Testing -- 4.4.1.3 K-fold crossvalidation -- 4.4.2 Handling unbalanced classes -- 4.4.3 Model generalization -- 4.4.3.1 Underfitting -- 4.4.3.2 Overfitting -- 4.4.3.3 Accurate fittings |
Bibliography |
Includes bibliographical references and index |
Notes |
Description based on online resource; title from digital title page (viewed on February 20, 2024) |
Subject |
Big data.
|
Form |
Electronic book
|
Author |
Bhattacharyya, Dhruba K., author.
|
|
Roy, Swarup (Computer scientist), author.
|
ISBN |
0323972632 |
|
9780323972635 |
|