Description |
1 online resource |
Contents |
The beam model. Streaming 101 -- The what, where, when, and how of data processing -- Watermarks -- Advanced windowing -- Exactly-once and side effects -- Streams and tables. The practicalities of persistent state -- Streaming SQL -- Streaming joins -- The evolution of large-scale data processing |
|
Intro; Copyright; Table of Contents; Preface Or: What Are You Getting Yourself Into Here?; Navigating This Book; Takeaways; Conventions Used in This Book; Online Resources; Figures; Code Snippets; O'Reilly Safari; How to Contact Us; Acknowledgments; Part I. The Beam Model; Chapter 1. Streaming 101; Terminology: What Is Streaming?; On the Greatly Exaggerated Limitations of Streaming; Event Time Versus Processing Time; Data Processing Patterns; Bounded Data; Unbounded Data: Batch; Unbounded Data: Streaming; Summary; Chapter 2. The What, Where, When, and How of Data Processing; Roadmap |
|
Batch Foundations: What and WhereWhat: Transformations; Where: Windowing; Going Streaming: When and How; When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!; When: Watermarks; When: Early/On-Time/Late Triggers FTW!; When: Allowed Lateness (i.e., Garbage Collection); How: Accumulation; Summary; Chapter 3. Watermarks; Definition; Source Watermark Creation; Perfect Watermark Creation; Heuristic Watermark Creation; Watermark Propagation; Understanding Watermark Propagation; Watermark Propagation and Output Timestamps; The Tricky Case of Overlapping Windows |
|
Percentile WatermarksProcessing-Time Watermarks; Case Studies; Case Study: Watermarks in Google Cloud Dataflow; Case Study: Watermarks in Apache Flink; Case Study: Source Watermarks for Google Cloud Pub/Sub; Summary; Chapter 4. Advanced Windowing; When/Where: Processing-Time Windows; Event-Time Windowing; Processing-Time Windowing via Triggers; Processing-Time Windowing via Ingress Time; Where: Session Windows; Where: Custom Windowing; Variations on Fixed Windows; Variations on Session Windows; One Size Does Not Fit All; Summary; Chapter 5. Exactly-Once and Side Effects |
|
Why Exactly Once MattersAccuracy Versus Completeness; Side Effects; Problem Definition; Ensuring Exactly Once in Shuffle; Addressing Determinism; Performance; Graph Optimization; Bloom Filters; Garbage Collection; Exactly Once in Sources; Exactly Once in Sinks; Use Cases; Example Source: Cloud Pub/Sub; Example Sink: Files; Example Sink: Google BigQuery; Other Systems; Apache Spark Streaming; Apache Flink; Summary; Part II. Streams and Tables; Chapter 6. Streams and Tables; Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity |
|
Toward a General Theory of Stream and Table RelativityBatch Processing Versus Streams and Tables; A Streams and Tables Analysis of MapReduce; Reconciling with Batch Processing; What, Where, When, and How in a Streams and Tables World; What: Transformations; Where: Windowing; When: Triggers; How: Accumulation; A Holistic View of Streams and Tables in the Beam Model; A General Theory of Stream and Table Relativity; Summary; Chapter 7. The Practicalities of Persistent State; Motivation; The Inevitability of Failure; Correctness and Efficiency; Implicit State; Raw Grouping; Incremental Combining |
Summary |
Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way. Expanded from Tyler Akidau's popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax |
Notes |
Includes index |
Bibliography |
Includes bibliographical references and index |
Notes |
Online resource; title from PDF title page (EBSCO, viewed July 23, 2018) |
Subject |
Streaming technology (Telecommunications)
|
|
Electronic data processing -- Distributed processing.
|
|
Big data.
|
|
Webcasts as Topic
|
|
COMPUTERS -- General.
|
|
Big data
|
|
Electronic data processing -- Distributed processing
|
|
Streaming technology (Telecommunications)
|
Form |
Electronic book
|
Author |
Chernyak, Slava, author.
|
|
Lax, Reuven, author.
|
LC no. |
2018277258 |
ISBN |
9781491983843 |
|
1491983841 |
|
9781491983829 |
|
1491983825 |
|