Limit search to available items
Book Cover
E-book

Title Principles of data wrangling : practical techniques for data preparation / Tye Rattenbury, Joseph M. Hellerstein, Jeffrey Heer, Sean Kandel and Connor Carreras
Published Sebastopol : O'Reilly, 2017

Copies

Description 1 online resource (viii, 82 pages) : illustrations
Contents Introduction: Magic Thresholds, PYMK, and User Growth at Facebook -- A Data Workflow Framework; How Data Flows During and Across Projects; Connecting Analytic Actions to Data Movement: A Holistic Workflow Framework for Data Projects; Raw Data Stage Actions: Ingest Data and Create Metadata; Ingesting Known and Unknown Data; Creating Metadata; Refined Data Stage Actions: Create Canonical Data and Conduct Ad Hoc Analyses; Designing Refined Data; Refined Stage Analytical Actions
Production Data Stage Actions: Create Production Data and Build Automated SystemsCreating Optimized Data; Designing Regular Reports and Automated Products/Services; Data Wrangling within the Workflow Framework; Chapter 3. The Dynamics of Data Wrangling; Data Wrangling Dynamics; Additional Aspects: Subsetting and Sampling; Core Transformation and Profiling Actions; Data Wrangling in the Workflow Framework; Ingesting Data; Describing Data; Assessing Data Utility; Designing and Building Refined Data; Ad Hoc Reporting; Exploratory Modeling and Forecasting; Building an Optimized Dataset
Regular Reporting and Building Data-Driven Products and ServicesChapter 4. Profiling; Overview of Profiling; Individual Value Profiling: Syntactic Profiling; Individual Value Profiling: Semantic Profiling; Set-Based Profiling; Profiling Individual Values in the Candidate Master File; Syntactic Profiling in the Candidate Master File; Set-Based Profiling in the Candidate Master File; Chapter 5. Transformation: Structuring; Overview of Structuring; Intrarecord Structuring: Extracting Values; Positional Extraction; Pattern Extraction; Complex Structure Extraction
Intrarecord Structuring: Combining Multiple Record FieldsInterrecord Structuring: Filtering Records and Fields; Interrecord Structuring: Aggregations and Pivots; Simple Aggregations; Column-to-Row Pivots; Row-to-Column Pivots; Chapter 6. Transformation: Enriching; Unions; Joins; Inserting Metadata; Derivation of Values; Generic; Proprietary; Chapter 7. Using Transformation to Clean Data; Addressing Missing/NULL Values; Addressing Invalid Values; Chapter 8. Roles and Responsibilities; Skills and Responsibilities; Data Engineer; Data Architect; Data Scientist; Analyst
Roles Across the Data Workflow FrameworkOrganizational Best Practices; Chapter 9. Data Wrangling Tools; Data Size and Infrastructure; Data Structures; Excel; SQL; Trifacta Wrangler; Transformation Paradigms; Excel; SQL; Trifacta Wrangler; Choosing a Data Wrangling Tool; About the Authors; Colophon
Summary "A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?""--Back cover
Subject Data mining.
Electronic data processing -- Data preparation.
REFERENCE -- Questions & Answers.
Data mining
Electronic data processing -- Data preparation
Form Electronic book
Author Rattenbury, Tye
Hellerstein, Joseph M., 1968-
Heer, Jeffrey Michael.
Kandel, Sean
Carreras, Connor
ISBN 1491938897
9781491938898
9781491938874
1491938870