Save to My Lists Export Return to Browse

Previous Record Next Record

Book Cover

E-book

Author

Wiktorski, Tomasz, author

Title Data-intensive systems : principles and fundamentals using Hadoop and Spark / Tomasz Wiktorski

Published Cham, Switzerland : Springer, [2019]

Click on the following:

Springer Computer Science eBooks

Springer eBooks

Copies

Description 1 online resource

Series Advanced information and knowledge processing

Advanced information and knowledge processing

Contents Intro; Contents; List of Figures; List of Listings; 1 Preface; 1.1 Conventions Used in this Book; 1.2 Listed Code; 1.3 Terminology; 1.4 Examples and Exercises; 2 Introduction; 2.1 Growing Datasets; 2.2 Hardware Trends; 2.3 The V's of Big Data; 2.4 NOSQL; 2.5 Data as the Fourth Paradigm of Science; 2.6 Example Applications; 2.6.1 Data Hub; 2.6.2 Search and Recommendations; 2.6.3 Retail Optimization; 2.6.4 Healthcare; 2.6.5 Internet of Things; 2.7 Main Tools; 2.7.1 Hadoop; 2.7.2 Spark; 2.8 Exercises; References; 3 Hadoop 101 and Reference Scenario; 3.1 Reference Scenario; 3.2 Hadoop Setup

3.3 Analyzing Unstructured Data3.4 Analyzing Structured Data; 3.5 Exercises; 4 Functional Abstraction; 4.1 Functional Programming Overview; 4.2 Functional Abstraction for Data Processing; 4.3 Functional Abstraction and Parallelism; 4.4 Lambda Architecture; 4.5 Exercises; Reference; 5 Introduction to MapReduce; 5.1 Reference Code; 5.2 Map Phase; 5.3 Combine Phase; 5.4 Shuffle Phase; 5.5 Reduce Phase; 5.6 Embarrassingly Parallel Problems; 5.7 Running MapReduce Programs; 5.8 Exercises; 6 Hadoop Architecture; 6.1 Architecture Overview; 6.2 Data Handling; 6.2.1 HDFS Architecture; 6.2.2 Read Flow

6.2.3 Write Flow6.2.4 HDFS Failovers; 6.3 Job Handling; 6.3.1 Job Flow; 6.3.2 Data Locality; 6.3.3 Job and Task Failures; 6.4 Exercises; 7 MapReduce Algorithms and Patterns; 7.1 Counting, Summing, and Averaging; 7.2 Search Assist; 7.3 Random Sampling; 7.4 Multiline Input; 7.5 Inverted Index; 7.6 Exercises; References; 8 NOSQL Databases; 8.1 NOSQL Overview and Examples; 8.1.1 CAP and PACELC Theorem; 8.2 HBase Overview; 8.3 Data Model; 8.4 Architecture; 8.4.1 Regions; 8.4.2 HFile, HLog, and Memstore; 8.4.3 Region Server Failover; 8.5 MapReduce and HBase; 8.5.1 Loading Data

8.5.2 Running Queries8.6 Exercises; References; 9 Spark; 9.1 Motivation; 9.2 Data Model; 9.2.1 Resilient Distributed Datasets and DataFrames; 9.2.2 Other Data Structures; 9.3 Programming Model; 9.3.1 Data Ingestion; 9.3.2 Basic Actions-Count, Take, and Collect; 9.3.3 Basic Transformations-Filter, Map, and reduceByKey; 9.3.4 Other Operations-flatMap and Reduce; 9.4 Architecture; 9.5 SparkSQL; 9.6 Exercises

Summary Data-intensive systems are a technological building block supporting Big Data and Data Science applications. This book familiarizes readers with core concepts that they should be aware of before continuing with independent work and the more advanced technical reference literature that dominates the current landscape. The material in the book is structured following a problem-based approach. This means that the content in the chapters is focused on developing solutions to simplified, but still realistic problems using data-intensive technologies and approaches. The reader follows one reference scenario through the whole book, that uses an open Apache dataset. The origins of this volume are in lectures from a master?s course in Data-intensive Systems, given at the University of Stavanger. Some chapters were also a base for guest lectures at Purdue University and Lodz University of Technology

Bibliography Includes bibliographical references

Notes Online resource; title from digital title page (viewed on February 14, 2019)

SUBJECT Apache Hadoop. http://id.loc.gov/authorities/names/n2013024279

Spark (Electronic resource : Apache Software Foundation) http://id.loc.gov/authorities/names/no2015027445

Apache Hadoop fast

Spark (Electronic resource : Apache Software Foundation) fast

Subject Databases.

Big data.

Big data

Databases

Form Electronic book

ISBN 3030046036

9783030046040

3030046044

9783030046033

Permalink