Limit search to available items
Book Cover
E-book
Author Walkowiak, Simon

Title Big Data Analytics with R
Edition 1
Published Packt Publishing, 2016

Copies

Description 1 online resource
Contents Cover; Copyright; Credits; About the Author; Acknowledgement; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: The Era of Big Data; Big Data -- The monster re-defined; Big Data toolbox -- dealing with the giant; Hadoop -- the elephant in the room; Databases; Hadoop Spark-ed up; R -- The unsung Big Data hero; Summary; Chapter 2: Introduction to R Programming Language and Statistical Environment; Learning R; Revisiting R basics; Getting R and RStudio ready; Setting the URLs to R repositories; R data structures; Vectors; Scalars; Matrices; Arrays; Data frames; Lists
Exporting R data objectsApplied data science with R; Importing data from different formats; Exploratory Data Analysis; Data aggregations and contingency tables; Hypothesis testing and statistical inference; Tests of differences; Independent t-test example (with power and effect size estimates); ANOVA example; Tests of relationships; An example of Pearson's r correlations; Multiple regression example; Data visualization packages; Summary; Chapter 3: Unleashing the Power of R from Within; Traditional limitations of R; Out-of-memory data; Processing speed; To the memory limits and beyond
Data transformations and aggregations with the ff and ffbase packagesGeneralized linear models with the ff and ffbase packages; Logistic regression example with ffbase and biglm; Expanding memory with the bigmemory package; Parallel R; From bigmemory to faster computations; An apply() example with the big.matrix object; A for() loop example with the ffdf object; Using apply() and for() loop examples on a data.frame; A parallel package example; A foreach package example; The future of parallel processing in R; Utilizing Graphics Processing Units with R
Multi-threading with Microsoft R Open distributionParallel machine learning with H2O and R; Boosting R performance with the data.table package and other tools; Fast data import and manipulation with the data.table package; Data import with data.table; Lightning-fast subsets and aggregations on data.table; Chaining, more complex aggregations, and pivot tables with data.table; Writing better R code; Summary; Chapter 4: Hadoop and MapReduce Framework for R; Hadoop architecture; Hadoop Distributed File System; MapReduce framework; A simple MapReduce word count example; Other Hadoop native tools
Learning HadoopA single-node Hadoop in Cloud; Deploying Hortonworks Sandbox on Azure; A word count example in Hadoop using Java; A word count example in Hadoop using the R language; RStudio Server on a Linux RedHat/CentOS virtual machine; Installing and configuring RHadoop packages; HDFS management and MapReduce in R -- a word count example; HDInsight -- a multi-node Hadoop cluster on Azure; Creating your first HDInsight cluster; Creating a new Resource Group; Deploying a Virtual Network; Creating a Network Security Group; Setting up and configuring an HDInsight cluster
Summary Utilize R to uncover hidden patterns in your Big DataAbout This Book Perform computational analyses on Big Data to generate meaningful results Get a practical knowledge of R programming language while working on Big Data platforms like Hadoop, Spark, H2O and SQL/NoSQL databases, Explore fast, streaming, and scalable data analysis with the most cutting-edge technologies in the marketWho This Book Is ForThis book is intended for Data Analysts, Scientists, Data Engineers, Statisticians, Researchers, who want to integrate R with their current or future Big Data workflows. It is assumed that readers have some experience in data analysis and understanding of data management and algorithmic processing of large quantities of data, however they may lack specific skills related to R. What You Will Learn Learn about current state of Big Data processing using R programming language and its powerful statistical capabilities Deploy Big Data analytics platforms with selected Big Data tools supported by R in a cost-effective and time-saving manner Apply the R language to real-world Big Data problems on a multi-node Hadoop cluster, e.g. electricity consumption across various socio-demographic indicators and bike share scheme usage Explore the compatibility of R with Hadoop, Spark, SQL and NoSQL databases, and H2O platformIn DetailBig Data analytics is the process of examining large and complex data sets that often exceed the computational capabilities. R is a leading programming language of data science, consisting of powerful functions to tackle all problems related to Big Data processing. The book will begin with a brief introduction to the Big Data world and its current industry standards. With introduction to the R language and presenting its development, structure, applications in real world, and its shortcomings. Book will progress towards revision of major R functions for data management and transformations. Readers will be introduce to Cloud based Big Data solutions (e.g. Amazon EC2 instances and Amazon RDS, Microsoft Azure and its HDInsight clusters) and also provide guidance on R connectivity with relational and non-relational databases such as MongoDB and HBase etc. It will further expand to include Big Data tools such as Apache Hadoop ecosystem, HDFS and MapReduce frameworks. Also other R compatible tools such as Apache Spark, its machine learning library Spark MLlib, as well as H2O. Style and approachThis book will serve as a practical guide to tackling Big Data problems using R programming language and its statistical environment. Each section of the book will present you with concise and easy-to-follow steps on how to process, transform and analyse large data sets
Notes Print version record
Subject Data mining -- Software
Big data -- Data processing -- Software
Information visualization -- Software
R (Computer program language)
Data mining
Information visualization
R (Computer program language)
Genre/Form Software
Form Electronic book
ISBN 1786463725
9781786463722
9781786466457
1786466457