Book Cover
E-book
Author Tiwary, Chandramani, author

Title Learning Apache Mahout : acquire practical skills in Big Data Analytics and explore data science with Apache Mahout / Chandramani Tiwary
Published Birmingham, UK : Packt Publishing, 2015

Copies

Description 1 online resource (1 volume) : illustrations
Series Community experience distilled
Community experience distilled.
Contents Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction to Mahout; Why Mahout; Simple techniques and more data is better; Sampling is difficult; Community and license; When Mahout; Data too large for single machine; Data already on Hadoop; Algorithms implemented in Mahout; How Mahout; Setting up the development environment ; Configuring Maven; Configuring Mahout; Configuring Eclipse with the Maven plugin and Mahout; Mahout command line; Clustering example; A classification example
Mahout API -- Java program exampleThe dataset; Parallel versus in-memory execution mode; Summary; Chapter 2: Core Concepts in Machine Learning; Supervised learning; Determine the objective; Decide the training data; Create and clean the training set; Feature extraction; Train the models; Bagging; Boosting; Validation; Holdout-set validation; K-fold cross validation; Evaluation; Bias-variance trade-off; Function complexity and amount of training data; Dimensionality of the input space; Noise in data; Unsupervised learning; Cluster analysis; Objective; Feature representation
Algorithm for clusteringA stopping criteria; Frequent pattern mining; Measures for identifying interesting rules; Things to consider; Recommender system; Collaborative filtering; Cold start; Scalability; Sparsity; Content-based filtering; Model efficacy; Classification; Confusion matrix; ROC curve and AUC; Regression; Mean absolute error; Root mean squared error; R-square; Adjusted R-square; Recommendation system; Score difference; Precision and recall; Clustering; The internal evaluation; External evaluation; Summary; Chapter 3: Feature Engineering; Feature engineering; Feature construction
Categorical featuresContinuous features; Feature extraction; Feature selection; Filter-based feature selection; Wrapper-based feature selection; Embedded feature selection; Dimensionality reduction; Summary; Chapter 4: Classification with Mahout; Classification; White box models; Black box models; Logistic regression; Mahout logistic regression command line; Getting the data; Model building via command line; Train the model command line option; Testing the model; Prediction; Adaptive regression model; Code example with logistic regression; Train the model
The LogisticRegressionParameter and CsvRecordFactory classCode example without the parameter class; Testing the online regression model; Getting predictions from OnlineLogisticRegression; CrossFoldLearner example; Random forest; Bagging; Random subsets of features; Out-of-bag error estimate; Random forest using the command line; Predictions from random forest; Naïve Bayes classifier; Numeric features with naïve Bayes; Command line; Summary; Chapter 5: Frequent Pattern Mining and Topic Modeling; Frequent pattern mining; Building FP Tree; Constructing the tree
Summary If you are a Java developer and want to use Mahout and machine learning to solve Big Data Analytics use cases then this book is for you. Familiarity with shell scripts is assumed but no prior experience is required
Notes Includes index
Online resource; title from cover (Safari, viewed April 16, 2015)
SUBJECT Mahout (Electronic resource) http://id.loc.gov/authorities/names/no2011176330
Mahout (Electronic resource) fast
Subject Machine learning.
Web site development.
COMPUTERS -- Programming -- Algorithms.
COMPUTERS -- Desktop Applications -- Databases.
Machine learning
Web site development
Form Electronic book
ISBN 9781783555222
178355522X
1783555211
9781783555215
Other Titles Acquire practical skills in Big Data Analytics and explore data science with Apache Mahout