Limit search to available items
Record 29 of 397
Previous Record Next Record
Book Cover
E-book
Author Jurney, Russell, author.

Title Agile data science 2.0 : building full-stack data analytics applications with Spark / Russell Jurney
Published Boston, MA : O'Reilly Media, 2017
©2017

Copies

Description 1 online resource
Contents Copyright; Table of Contents; Preface; Agile Data Science Mailing List; Data Syndrome, Product Analytics Consultancy; Live Training; Who This Book Is For; How This Book Is Organized; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Part I. Setup; Chapter 1. Theory; Introduction; Definition; Methodology as Tweet; Agile Data Science Manifesto; The Problem with the Waterfall; Research Versus Application Development; The Problem with Agile Software; Eventual Quality: Financing Technical Debt; The Pull of the Waterfall; The Data Science Process
Setting ExpectationsData Science Team Roles; Recognizing the Opportunity and the Problem; Adapting to Change; Notes on Process; Code Review and Pair Programming; Agile Environments: Engineering Productivity; Realizing Ideas with Large-Format Printing; Chapter 2. Agile Tools; Scalability = Simplicity; Agile Data Science Data Processing; Local Environment Setup; System Requirements; Setting Up Vagrant; Downloading the Data; EC2 Environment Setup; Downloading the Data; Getting and Running the Code; Getting the Code; Running the Code; Jupyter Notebooks; Touring the Toolset
Agile Stack RequirementsPython 3; Serializing Events with JSON Lines and Parquet; Collecting Data; Data Processing with Spark; Publishing Data with MongoDB; Searching Data with Elasticsearch; Distributed Streams with Apache Kafka; Processing Streams with PySpark Streaming; Machine Learning with scikit-learn and Spark MLlib; Scheduling with Apache Airflow (Incubating); Reflecting on Our Workflow; Lightweight Web Applications; Presenting Our Data; Conclusion; Chapter 3. Data; Air Travel Data; Flight On-Time Performance Data; OpenFlights Database; Weather Data
Data Processing in Agile Data ScienceStructured Versus Semistructured Data; SQL Versus NoSQL; SQL; NoSQL and Dataflow Programming; Spark: SQL + NoSQL; Schemas in NoSQL; Data Serialization; Extracting and Exposing Features in Evolving Schemas; Conclusion; Part II. Climbing the Pyramid; Chapter 4. Collecting and Displaying Records; Putting It All Together; Collecting and Serializing Flight Data; Processing and Publishing Flight Records; Publishing Flight Records to MongoDB; Presenting Flight Records in a Browser; Serving Flights with Flask and pymongo; Rendering HTML5 with Jinja2
Agile CheckpointListing Flights; Listing Flights with MongoDB; Paginating Data; Searching for Flights; Creating Our Index; Publishing Flights to Elasticsearch; Searching Flights on the Web; Conclusion; Chapter 5. Visualizing Data with Charts and Tables; Chart Quality: Iteration Is Essential; Scaling a Database in the Publish/Decorate Model; First Order Form; Second Order Form; Third Order Form; Choosing a Form; Exploring Seasonality; Querying and Presenting Flight Volume; Extracting Metal (Airplanes [Entities]); Extracting Tail Numbers; Assessing Our Airplanes; Data Enrichment
Notes Online resource; title from PDF title page (EBSCO, viewed June 13, 2017)
Subject Data mining.
Agile software development.
Data Mining
COMPUTERS -- General.
Agile software development
Data mining
Form Electronic book
ISBN 9781491960080
1491960086
9781491960066
149196006X