Limit search to available items
Book Cover
E-book
Author Akidau, Tyler, author.

Title Streaming systems : the what, where, when, and how of large-scale data processing / Tyler Akidau, Slava Chernyak, Reuven Lax
Published Sebastopol, CA : O'Reilly Media, Inc., [2018]
©2018

Copies

Description 1 online resource
Contents The beam model. Streaming 101 -- The what, where, when, and how of data processing -- Watermarks -- Advanced windowing -- Exactly-once and side effects -- Streams and tables. The practicalities of persistent state -- Streaming SQL -- Streaming joins -- The evolution of large-scale data processing
Intro; Copyright; Table of Contents; Preface Or: What Are You Getting Yourself Into Here?; Navigating This Book; Takeaways; Conventions Used in This Book; Online Resources; Figures; Code Snippets; O'Reilly Safari; How to Contact Us; Acknowledgments; Part I. The Beam Model; Chapter 1. Streaming 101; Terminology: What Is Streaming?; On the Greatly Exaggerated Limitations of Streaming; Event Time Versus Processing Time; Data Processing Patterns; Bounded Data; Unbounded Data: Batch; Unbounded Data: Streaming; Summary; Chapter 2. The What, Where, When, and How of Data Processing; Roadmap
Batch Foundations: What and WhereWhat: Transformations; Where: Windowing; Going Streaming: When and How; When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!; When: Watermarks; When: Early/On-Time/Late Triggers FTW!; When: Allowed Lateness (i.e., Garbage Collection); How: Accumulation; Summary; Chapter 3. Watermarks; Definition; Source Watermark Creation; Perfect Watermark Creation; Heuristic Watermark Creation; Watermark Propagation; Understanding Watermark Propagation; Watermark Propagation and Output Timestamps; The Tricky Case of Overlapping Windows
Percentile WatermarksProcessing-Time Watermarks; Case Studies; Case Study: Watermarks in Google Cloud Dataflow; Case Study: Watermarks in Apache Flink; Case Study: Source Watermarks for Google Cloud Pub/Sub; Summary; Chapter 4. Advanced Windowing; When/Where: Processing-Time Windows; Event-Time Windowing; Processing-Time Windowing via Triggers; Processing-Time Windowing via Ingress Time; Where: Session Windows; Where: Custom Windowing; Variations on Fixed Windows; Variations on Session Windows; One Size Does Not Fit All; Summary; Chapter 5. Exactly-Once and Side Effects
Why Exactly Once MattersAccuracy Versus Completeness; Side Effects; Problem Definition; Ensuring Exactly Once in Shuffle; Addressing Determinism; Performance; Graph Optimization; Bloom Filters; Garbage Collection; Exactly Once in Sources; Exactly Once in Sinks; Use Cases; Example Source: Cloud Pub/Sub; Example Sink: Files; Example Sink: Google BigQuery; Other Systems; Apache Spark Streaming; Apache Flink; Summary; Part II. Streams and Tables; Chapter 6. Streams and Tables; Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
Toward a General Theory of Stream and Table RelativityBatch Processing Versus Streams and Tables; A Streams and Tables Analysis of MapReduce; Reconciling with Batch Processing; What, Where, When, and How in a Streams and Tables World; What: Transformations; Where: Windowing; When: Triggers; How: Accumulation; A Holistic View of Streams and Tables in the Beam Model; A General Theory of Stream and Table Relativity; Summary; Chapter 7. The Practicalities of Persistent State; Motivation; The Inevitability of Failure; Correctness and Efficiency; Implicit State; Raw Grouping; Incremental Combining
Summary Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau's popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax
Notes Includes index
Bibliography Includes bibliographical references and index
Notes Online resource; title from PDF title page (EBSCO, viewed July 23, 2018)
Subject Streaming technology (Telecommunications)
Electronic data processing -- Distributed processing.
Big data.
Webcasts as Topic
COMPUTERS -- General.
Big data
Electronic data processing -- Distributed processing
Streaming technology (Telecommunications)
Form Electronic book
Author Chernyak, Slava, author.
Lax, Reuven, author.
LC no. 2018277258
ISBN 9781491983843
1491983841
9781491983829
1491983825