Description |
1 online resource (398 p.) |
Contents |
Intro -- Preface -- Algorithm Competition Era -- Why Write -- Features of the Book -- Target Readers -- Welcome to Contact with Us -- Acknowledgments -- Contents -- Part I: Half the Work, Twice the Effect -- Chapter 1: Guide to the Competitions -- 1.1 Competition Platforms -- 1.1.1 Kaggle -- 1.1.2 Tianchi -- 1.1.2.1 Registration -- 1.1.2.2 Competition System -- 1.1.2.3 Points -- 1.1.3 DF -- 1.1.4 DC -- 1.1.5 Kesci -- 1.1.6 JDATA -- 1.1.7 Corporate Websites -- 1.2 Competition Procedures -- 1.2.1 Problem Modeling -- 1.2.2 Data Exploration -- 1.2.3 Feature Engineering -- 1.2.4 Model Training |
|
1.2.5 Model Integration -- 1.3 Competition Types -- 1.3.1 Data Types -- 1.3.2 Task Types -- 1.3.3 Application Scenarios -- 1.4 Thinking Exercises -- Chapter 2: Problem Modeling -- 2.1 Understanding the Competition Question -- 2.1.1 Business Background -- 2.1.1.1 Go Deep into the Business -- 2.1.1.2 Be Clear About the Goals -- 2.1.2 Understanding Data -- 2.1.3 Evaluation Indicators -- 2.1.3.1 Classification Indicators -- Error Rate and Accuracy -- Precision and Recall -- F1-score -- ROC Curve -- AUC -- Logarithmic Loss -- 2.1.3.2 Indicators of Regression -- Mean Absolute Error |
|
Mean Squared Error -- Root Mean Squared Error -- Average Absolute Percentage Error -- 2.2 Sample Selection -- 2.2.1 Main Reasons -- 2.2.1.1 Too Large Data Set -- 2.2.1.2 Data Noise -- 2.2.1.3 Data Redundancy -- 2.2.1.4 Uneven Distribution of Positive and Negative Samples -- 2.2.2 Accurate Methods -- 2.2.3 Application Scenarios -- 2.3 Offline Evaluation Strategy -- 2.3.1 Strong Time Sequence Problems -- 2.3.2 Weak Time Sequence Problems -- 2.4 Cases in Practice -- 2.4.1 Understanding the Competition Question -- 2.4.2 Offline Verification -- 2.5 Thinking Exercises -- Chapter 3: Data Exploration |
|
3.1 Preliminary Data Exploration -- 3.1.1 Analytical Thinking -- 3.1.2 Analysis Methods -- 3.1.3 Purpose Clarification -- 3.2 Variable Analysis -- 3.2.1 Univariate Analysis -- 3.2.1.1 Labels -- 3.2.1.2 Continuous Type -- 3.2.1.3 Category Type -- 3.2.2 Multivariate Analysis -- 3.3 Model Analysis -- 3.3.1 Learning Curve -- 3.3.1.1 Underfitting Learning Curve -- 3.3.1.2 Overfitting Learning Curve -- 3.3.2 Feature Importance Analysis -- 3.3.3 Error Analysis -- 3.4 Thinking Exercises -- Chapter 4: Feature Engineering -- 4.1 Data Preprocessing -- 4.1.1 Processing Missing Values |
|
4.1.1.1 Distinguishing Missing Values -- 4.1.1.2 Processing Method -- 4.1.2 Dealing with Outliers -- 4.1.2.1 Looking for Outliers -- 4.1.2.2 Coping with Outliers -- 4.1.3 Optimizing Memory -- 4.2 Feature Transformation -- 4.2.1 Non-dimensionalization Processing of Continuous Variables -- 4.2.2 Data Transformation of Continuous Variables -- 4.2.2.1 log Transformation -- 4.2.2.2 Discretization of Continuous Variables -- 4.2.3 Category Feature Transformation -- 4.2.4 Irregular Feature Transformation -- 4.3 Feature Extraction -- 4.3.1 Statistics Features Related to Categories -- 4.3.1.1 Target Coding |
Summary |
This book systematically introduces the competitions in the field of algorithm and machine learning. The first author of the book has won 5 championships and 5 runner-ups in domestic and international algorithm competitions. Firstly, it takes common competition scenarios as a guide by giving the main processes of using machine learning to solve real-world problems, namely problem modelling, data exploration, feature engineering, model training. And then lists the main points of difficulties, general ideas with solutions in the whole process. Moreover, this book comprehensively covers several common problems in the field of machine learning competitions such as recommendation, temporal prediction, advertising, text computing, etc. The authors, also knew as "competition professionals, will explain the actual cases in detail and teach you various processes, routines, techniques and strategies, which is a rare treasure book for all competition enthusiasts. It is very suitable for readers who are interested in algorithm competitions and deep learning algorithms in practice, or computer-related majors |
Notes |
4.3.1.2 count, nunique, ratio |
|
Online resource; title from PDF title page (SpringerLink, viewed October 24, 2023) |
Subject |
Machine learning -- Competitions
|
|
Computer algorithms -- Competitions
|
Form |
Electronic book
|
Author |
Liu, Peng
|
|
Qian, Qian
|
ISBN |
9789819937233 |
|
981993723X |
|