Save to My Lists Export Return to Browse

Previous Record Next Record

Book Cover

E-book

Author

Cichosz, Paweł, author

Title Data mining algorithms : explained using R / Pawel Cichosz

Published Chichester, West Sussex ; Malden, MA : John Wiley & Sons Inc., 2015

Online access available from:

Wiley Online Books View Resource Record

Copies

Description 1 online resource (xxxi, 683 pages)

Contents 7.3.2. Evaluation caveats -- 7.3.3. Hold-out -- 7.3.4. Cross-validation -- 7.3.5. Leave-one-out -- 7.3.6. Bootstrapping -- 7.3.7. Choosing the right procedure -- 7.3.8. Evaluation procedures for temporal data -- 7.4. Conclusion -- 7.5. Further readings -- References -- pt. III Regression -- 8. Linear regression -- 8.1. Introduction -- 8.2. Linear representation -- 8.2.1. Parametric representation -- 8.2.2. Linear representation function -- 8.2.3. Nonlinear representation functions -- 8.3. Parameter estimation -- 8.3.1. Mean square error minimization -- 8.3.2. Delta rule -- 8.3.3. Gradient descent -- 8.3.4. Least squares -- 8.4. Discrete attributes -- 8.5. Advantages of linear models -- 8.6. Beyond linearity -- 8.6.1. Generalized linear representation -- 8.6.2. Enhanced representation -- 8.6.3. Polynomial regression -- 8.6.4. Piecewise-linear regression -- 8.7. Conclusion -- 8.8. Further readings -- References -- 9. Regression trees -- 9.1. Introduction -- 9.2. Regression tree model -- 9.2.1. Nodes and branches -- 9.2.2. Leaves -- 9.2.3. Split types -- 9.2.4. Piecewise-constant regression -- 9.3. Growing -- 9.3.1. Algorithm outline -- 9.3.2. Target function summary statistics -- 9.3.3. Target value assignment -- 9.3.4. Stop criteria -- 9.3.5. Split selection -- 9.3.6. Split application -- 9.3.7. Complete process -- 9.4. Pruning -- 9.4.1. Pruning operators -- 9.4.2. Pruning criterion -- 9.4.3. Pruning control strategy -- 9.5. Prediction -- 9.6. Weighted instances -- 9.7. Missing value handling -- 9.7.1. Fractional instances -- 9.7.2. Surrogate splits -- 9.8. Piecewise linear regression -- 9.8.1. Growing -- 9.8.2. Pruning -- 9.8.3. Prediction -- 9.9. Conclusion -- 9.10. Further readings -- References -- 10. Regression model evaluation -- 10.1. Introduction -- 10.1.1. Dataset performance -- 10.1.2. Training performance -- 10.1.3. True performance -- 10.2. Performance measures -- 10.2.1. Residuals -- 10.2.2. Mean absolute error -- 10.2.3. Mean square error -- 10.2.4. Root mean square error -- 10.2.5. Relative absolute error -- 10.2.6. Coefficient of determination -- 10.2.7. Correlation -- 10.2.8. Weighted performance measures -- 10.2.9. Loss functions -- 10.3. Evaluation procedures -- 10.3.1. Hold-out -- 10.3.2. Cross-validation -- 10.3.3. Leave-one-out -- 10.3.4. Bootstrapping -- 10.3.5. Choosing the right procedure -- 10.4. Conclusion -- 10.5. Further readings -- References -- pt. IV Clustering -- 11. (Dis)similarity measures -- 11.1. Introduction -- 11.2. Measuring dissimilarity and similarity -- 11.3. Difference-based dissimilarity -- 11.3.1. Euclidean distance -- 11.3.2. Minkowski distance -- 11.3.3. Manhattan distance -- 11.3.4. Canberra distance -- 11.3.5. Chebyshev distance -- 11.3.6. Hamming distance -- 11.3.7. Gower's coefficient -- 11.3.8. Attribute weighting -- 11.3.9. Attribute transformation -- 11.4. Correlation-based similarity -- 11.4.1. Discrete attributes -- 11.4.2. Pearson's correlation similarity -- 11.4.3. Spearman's correlation similarity -- 11.4.4. Cosine similarity -- 11.5. Missing attribute values -- 11.6. Conclusion -- 11.7. Further readings -- References -- 12. K-Centers clustering -- 12.1. Introduction -- 12.1.1. Basic principle -- 12.1.2. (Dis)similarity measures -- 12.2. Algorithm scheme -- 12.2.1. Initialization -- 12.2.2. Stop criteria -- 12.2.3. Cluster formation -- 12.2.4. Implicit cluster modeling -- 12.2.5. Instantiations -- 12.3. k-Means -- 12.3.1. Center adjustment -- 12.3.2. Minimizing dissimilarity to centers -- 12.4. Beyond means -- 12.4.1. k-Medians -- 12.4.2. k-Medoids -- 12.5. Beyond (fixed) k -- 12.5.1. Multiple runs -- 12.5.2. Adaptive k-centers -- 12.6. Explicit cluster modeling -- 12.7. Conclusion -- 12.8. Further readings -- References -- 13. Hierarchical clustering -- 13.1. Introduction -- 13.1.1. Basic approaches -- 13.1.2. (Dis)similarity measures -- 13.2. Cluster hierarchies -- 13.2.1. Motivation -- 13.2.2. Model representation -- 13.3. Agglomerative clustering -- 13.3.1. Algorithm scheme -- 13.3.2. Cluster linkage -- 13.4. Divisive clustering -- 13.4.1. Algorithm scheme -- 13.4.2. Wrapping a flat clustering algorithm -- 13.4.3. Stop criteria -- 13.5. Hierarchical clustering visualization -- 13.6. Hierarchical clustering prediction -- 13.6.1. Cutting cluster hierarchies -- 13.6.2. Cluster membership assignment -- 13.7. Conclusion -- 13.8. Further readings -- References -- 14. Clustering model evaluation -- 14.1. Introduction -- 14.1.1. Dataset performance

Machine generated contents note: pt. I Preliminaries -- 1. Tasks -- 1.1. Introduction -- 1.1.1. Knowledge -- 1.1.2. Inference -- 1.2. Inductive learning tasks -- 1.2.1. Domain -- 1.2.2. Instances -- 1.2.3. Attributes -- 1.2.4. Target attribute -- 1.2.5. Input attributes -- 1.2.6. Training set -- 1.2.7. Model -- 1.2.8. Performance -- 1.2.9. Generalization -- 1.2.10. Overfitting -- 1.2.11. Algorithms -- 1.2.12. Inductive learning as search -- 1.3. Classification -- 1.3.1. Concept -- 1.3.2. Training set -- 1.3.3. Model -- 1.3.4. Performance -- 1.3.5. Generalization -- 1.3.6. Overfitting -- 1.3.7. Algorithms -- 1.4. Regression -- 1.4.1. Target function -- 1.4.2. Training set -- 1.4.3. Model -- 1.4.4. Performance -- 1.4.5. Generalization -- 1.4.6. Overfitting -- 1.4.7. Algorithms -- 1.5. Clustering -- 1.5.1. Motivation -- 1.5.2. Training set -- 1.5.3. Model -- 1.5.4. Crisp vs. soft clustering -- 1.5.5. Hierarchical clustering -- 1.5.6. Performance -- 1.5.7. Generalization -- 1.5.8. Algorithms -- 1.5.9. Descriptive vs. predictive clustering -- 1.6. Practical issues -- 1.6.1. Incomplete data -- 1.6.2. Noisy data -- 1.7. Conclusion -- 1.8. Further readings -- References -- 2. Basic statistics -- 2.1. Introduction -- 2.2. Notational conventions -- 2.3. Basic statistics as modeling -- 2.4. Distribution description -- 2.4.1. Continuous attributes -- 2.4.2. Discrete attributes -- 2.4.3. Confidence intervals -- 2.4.4. m-Estimation -- 2.5. Relationship detection -- 2.5.1. Significance tests -- 2.5.2. Continuous attributes -- 2.5.3. Discrete attributes -- 2.5.4. Mixed attributes -- 2.5.5. Relationship detection caveats -- 2.6. Visualization -- 2.6.1. Boxplot -- 2.6.2. Histogram -- 2.6.3. Barplot -- 2.7. Conclusion -- 2.8. Further readings -- References -- pt. II Classification -- 3. Decision trees -- 3.1. Introduction -- 3.2. Decision tree model -- 3.2.1. Nodes and branches -- 3.2.2. Leaves -- 3.2.3. Split types -- 3.3. Growing -- 3.3.1. Algorithm outline -- 3.3.2. Class distribution calculation -- 3.3.3. Class label assignment -- 3.3.4. Stop criteria -- 3.3.5. Split selection -- 3.3.6. Split application -- 3.3.7. Complete process -- 3.4. Pruning -- 3.4.1. Pruning operators -- 3.4.2. Pruning criterion -- 3.4.3. Pruning control strategy -- 3.4.4. Conversion to rule sets -- 3.5. Prediction -- 3.5.1. Class label prediction -- 3.5.2. Class probability prediction -- 3.6. Weighted instances -- 3.7. Missing value handling -- 3.7.1. Fractional instances -- 3.7.2. Surrogate splits -- 3.8. Conclusion -- 3.9. Further readings -- References -- 4. Naive Bayes classifier -- 4.1. Introduction -- 4.2. Bayes rule -- 4.3. Classification by Bayesian inference -- 4.3.1. Conditional class probability -- 4.3.2. Prior class probability -- 4.3.3. Independence assumption -- 4.3.4. Conditional attribute value probabilities -- 4.3.5. Model construction -- 4.3.6. Prediction -- 4.4. Practical issues -- 4.4.1. Zero and small probabilities -- 4.4.2. Linear classification -- 4.4.3. Continuous attributes -- 4.4.4. Missing attribute values -- 4.4.5. Reducing naivety -- 4.5. Conclusion -- 4.6. Further readings -- References -- 5. Linear classification -- 5.1. Introduction -- 5.2. Linear representation -- 5.2.1. Inner representation function -- 5.2.2. Outer representation function -- 5.2.3. Threshold representation -- 5.2.4. Logit representation -- 5.3. Parameter estimation -- 5.3.1. Delta rule -- 5.3.2. Gradient descent -- 5.3.3. Distance to decision boundary -- 5.3.4. Least squares -- 5.4. Discrete attributes -- 5.5. Conclusion -- 5.6. Further readings -- References -- 6. Misclassification costs -- 6.1. Introduction -- 6.2. Cost representation -- 6.2.1. Cost matrix -- 6.2.2. Per-class cost vector -- 6.2.3. Instance-specific costs -- 6.3. Incorporating misclassification costs -- 6.3.1. Instance weighting -- 6.3.2. Instance resampling -- 6.3.3. Minimum-cost rule -- 6.3.4. Instance relabeling -- 6.4. Effects of cost incorporation -- 6.5. Experimental procedure -- 6.6. Conclusion -- 6.7. Further readings -- References -- 7. Classification model evaluation -- 7.1. Introduction -- 7.1.1. Dataset performance -- 7.1.2. Training performance -- 7.1.3. True performance -- 7.2. Performance measures -- 7.2.1. Misclassification error -- 7.2.2. Weighted misclassification error -- 7.2.3. Mean misclassification cost -- 7.2.4. Confusion matrix -- 7.2.5. ROC analysis -- 7.2.6. Probabilistic performance measures -- 7.3. Evaluation procedures -- 7.3.1. Model evaluation vs. modeling procedure evaluation

Note continued: 14.1.2. Training performance -- 14.1.3. True performance -- 14.2. Per-cluster quality measures -- 14.2.1. Diameter -- 14.2.2. Separation -- 14.2.3. Isolation -- 14.2.4. Silhouette width -- 14.2.5. Davies -- Bouldin index -- 14.3. Overall quality measures -- 14.3.1. Dunn index -- 14.3.2. Average Davies-Bouldin index -- 14.3.3. C index -- 14.3.4. Average silhouette width -- 14.3.5. Loglikelihood -- 14.4. External quality measures -- 14.4.1. Misclassification error -- 14.4.2. Rand index -- 14.4.3. General relationship detection measures -- 14.5. Using quality measures -- 14.6. Conclusion -- 14.7. Further readings -- References -- pt. V Getting Better Models -- 15. Model ensembles -- 15.1. Introduction -- 15.2. Model committees -- 15.3. Base models -- 15.3.1. Different training sets -- 15.3.2. Different algorithms -- 15.3.3. Different parameter setups -- 15.3.4. Algorithm randomization -- 15.3.5. Base model diversity -- 15.4. Model aggregation -- 15.4.1. Voting/Averaging -- 15.4.2. Probability averaging -- 15.4.3. Weighted voting/averaging -- 15.4.4. Using as attributes -- 15.5. Specific ensemble modeling algorithms -- 15.5.1. Bagging -- 15.5.2. Stacking -- 15.5.3. Boosting -- 15.5.4. Random forest -- 15.5.5. Random Naive Bayes -- 15.6. Quality of ensemble predictions -- 15.7. Conclusion -- 15.8. Further readings -- References -- 16. Kernel methods -- 16.1. Introduction -- 16.2. Support vector machines -- 16.2.1. Classification margin -- 16.2.2. Maximum-margin hyperplane -- 16.2.3. Primal form -- 16.2.4. Dual form -- 16.2.5. Soft margin -- 16.3. Support vector regression -- 16.3.1. Regression tube -- 16.3.2. Primal form -- 16.3.3. Dual form -- 16.4. Kernel trick -- 16.5. Kernel functions -- 16.5.1. Linear kernel -- 16.5.2. Polynomial kernel -- 16.5.3. Radial kernel -- 16.5.4. Sigmoid kernel -- 16.6. Kernel prediction -- 16.7. Kernel-based algorithms -- 16.7.1. Kernel-based SVM -- 16.7.2. Kernel-based SVR -- 16.8. Conclusion -- 16.9. Further readings -- References -- 17. Attribute transformation -- 17.1. Introduction -- 17.2. Attribute transformation task -- 17.2.1. Target task -- 17.2.2. Target attribute -- 17.2.3. Transformed attribute -- 17.2.4. Training set -- 17.2.5. Modeling transformations -- 17.2.6. Nonmodeling transformations -- 17.3. Simple transformations -- 17.3.1. Standardization -- 17.3.2. Normalization -- 17.3.3. Aggregation -- 17.3.4. Imputation -- 17.3.5. Binary encoding -- 17.4. Multiclass encoding -- 17.4.1. Encoding and decoding functions -- 17.4.2. 1-ok-k encoding -- 17.4.3. Error-correcting encoding -- 17.4.4. Effects of multiclass encoding -- 17.5. Conclusion -- 17.6. Further readings -- References -- 18. Discretization -- 18.1. Introduction -- 18.2. Discretization task -- 18.2.1. Motivation -- 18.2.2. Task definition -- 18.2.3. Discretization as modeling -- 18.2.4. Discretization quality -- 18.3. Unsupervised discretization -- 18.3.1. Equal-width intervals -- 18.3.2. Equal-frequency intervals -- 18.3.3. Nonmodeling discretization -- 18.4. Supervised discretization -- 18.4.1. Pure-class discretization -- 18.4.2. Bottom-up discretization -- 18.4.3. Top-down discretization -- 18.5. Effects of discretization -- 18.6. Conclusion -- 18.7. Further readings -- References -- 19. Attribute selection -- 19.1. Introduction -- 19.2. Attribute selection task -- 19.2.1. Motivation -- 19.2.2. Task definition -- 19.2.3. Algorithms -- 19.3. Attribute subset search -- 19.3.1. Search task -- 19.3.2. Initial state -- 19.3.3. Search operators -- 19.3.4. State selection -- 19.3.5. Stop criteria -- 19.4. Attribute selection filters -- 19.4.1. Simple statistical niters -- 19.4.2. Correlation-based filters -- 19.4.3. Consistency-based filters -- 19.4.4. Relief -- 19.4.5. Random forest -- 19.4.6. Cutoff criteria -- 19.4.7. Filter-driven search -- 19.5. Attribute selection wrappers -- 19.5.1. Subset evaluation -- 19.5.2. Wrapper attribute selection -- 19.6. Effects of attribute selection -- 19.7. Conclusion -- 19.8. Further readings -- References -- 20. Case studies -- 20.1. Introduction -- 20.1.1. Datasets -- 20.1.2. Packages -- 20.1.3. Auxiliary functions -- 20.2. Census income -- 20.2.1. Data loading and preprocessing -- 20.2.2. Default model -- 20.2.3. Incorporating misclassification costs -- 20.2.4. Pruning -- 20.2.5. Attribute selection -- 20.2.6. Final models -- 20.3. Communities and crime -- 20.3.1. Data loading -- 20.3.2. Data quality -- 20.3.3. Regression trees -- 20.3.4. Linear models -- 20.3.5. Attribute selection -- 20.3.6. Piecewise-linear models -- 20.4. Cover type -- 20.4.1. Data loading and preprocessing -- 20.4.2. Class imbalance -- 20.4.3. Decision trees -- 20.4.4. Class rebalancing -- 20.4.5. Multiclass encoding -- 20.4.6. Final classification models -- 20.4.7. Clustering -- 20.5. Conclusion -- 20.6. Further readings -- References -- Closing -- A. Notation -- A.1. Attribute values -- A.2. Data subsets -- A.3. Probabilities -- B. R packages -- B.1. CRAN packages -- B.2. DMR packages -- B.3. Installing packages -- References -- C. Datasets

Summary "This book narrows down the scope of data mining by adopting a heavily modeling-oriented perspective"-- Provided by publisher

Bibliography Includes bibliographical references and index

Notes Print version record and CIP data provided by publisher

Subject Computer algorithms.

Data mining.

R (Computer program language)

Form Electronic book

LC no. 2014037576

ISBN 1118950801 (electronic bk.)

1118950844 (electronic bk.)

1322317461

9781118950807 (electronic bk.)

9781118950845 (electronic bk.)

9781322317465

(hardback)

(hardback)

Permalink