Description

Book Synopsis
This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.This book covers Spark 3's new features, theoretical foundations, and application architecture. The first section introduces the Apache Spark ecosystem as a unified engine for large scale data analytics, and shows you how to run and fine-tune your first application in Spark. The second section centers on batch processing suited to end-of-cycle processing, and data ingestion through files and databases. It explains Spark DataFrame API as well as structured and unstructured data with Apache Spark. The last section deals with scalable, high-throughput, fault-tolerant streaming processing workloads to process real-time data. Here you'll learn about Apache Spark Streaming's execution model, the architecture of Spark S

Table of Contents
Part I. Apache Spark Batch Data ProcessingChapter 1: Introduction to Apache Spark for Large-Scale Data Analytics1.1. What is Apache Spark? 1.2. Spark Unified Analytics1.3. Batch vs Streaming Data1.4. Spark Ecosystem
Chapter 2: Getting Started with Apache Spark2.2. Scala and PySpark Interfaces
2.3. Spark Application Concepts2.4. Transformations and Actions in Apache Spark2.5. Lazy Evaluation in Apache Spark2.6. First Application in Spark2.7. Apache Spark Web UI
Chapter 3: Spark Dataframe API
Chapter 4: Spark Dataset API
Chapter 5: Structured and Unstructured Data with Apache Spark5.1. Data Sources5.2. Generic Load/Save Functions5.3. Generic File Source Options5.4. Parquet Files5.5. ORC Files5.6. JSON Files5.7. CSV Files5.8. Text Files5.9. Hive Tables5.10. JDBC To Other Databases
Chapter 6: Spark Machine Learning with MLlib
Part II. Spark Data StreamingChapter 7: Introduction to Apache Spark Streaming7.1. Apache Spark Streaming’s Execution Model7.2. Stream Processing Architectures7.3. Architecture of Spark Streaming: Discretized Streams7.4. Benefits of Discretized Stream Processing7.4.1. Dynamic Load Balancing7.4.2. Fast Failure and Straggler Recovery
Chapter 8: Structured Streaming8.1. Streaming Analytics8.2. Connecting to a Stream8.3. Preparing the Data in a Stream8.4. Operations on a Streaming Dataset
Chapter 9: Structured Streaming Sources9.1. File Sources9.2. Apache Kafka Source9.3. A Rate Source
Chapter 10: Structured Streaming Sinks10.1. Output Modes10.2. Output Sinks10.3. File Sink10.4. The Kafka Sink10.5. The Memory Sink 10.6. Streaming Table APIs10.7. Triggers10.8. Managing Streaming Queries10.9. Monitoring Streaming Queries10.9.1. Reading Metrics Interactively10.9.2. Reporting Metrics programmatically using Asynchronous APIs10.9.3. Reporting Metrics using Dropwizard10.9.4. Recovering from Failures with Checkpointing10.9.5. Recovery Semantics after Changes in a Streaming Query
Chapter 11: Future Directions for Spark Streaming11.1. Backpressure11.2. Dynamic Scaling11.3. Event time and out-of-order data11.4. UI enhancements11.5. Continuous Processing
Chapter 12: Watermarks. A deep survey of temporal progress metrics

Handson Guide to Apache Spark 3

    Product form

    £46.74

    Includes FREE delivery

    RRP £54.99 – you save £8.25 (15%)

    Order before 4pm tomorrow for delivery by Mon 22 Jun 2026.

    A Paperback by Alfonso Antolinez Garcia

    1 in stock


      View other formats and editions of Handson Guide to Apache Spark 3 by Alfonso Antolinez Garcia

      Publisher: APress
      Publication Date: 1/6/2023 12:06:00 AM
      ISBN13: 9781484293799, 978-1484293799
      ISBN10: 1484293797

      Description

      Book Synopsis
      This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.This book covers Spark 3's new features, theoretical foundations, and application architecture. The first section introduces the Apache Spark ecosystem as a unified engine for large scale data analytics, and shows you how to run and fine-tune your first application in Spark. The second section centers on batch processing suited to end-of-cycle processing, and data ingestion through files and databases. It explains Spark DataFrame API as well as structured and unstructured data with Apache Spark. The last section deals with scalable, high-throughput, fault-tolerant streaming processing workloads to process real-time data. Here you'll learn about Apache Spark Streaming's execution model, the architecture of Spark S

      Table of Contents
      Part I. Apache Spark Batch Data ProcessingChapter 1: Introduction to Apache Spark for Large-Scale Data Analytics1.1. What is Apache Spark? 1.2. Spark Unified Analytics1.3. Batch vs Streaming Data1.4. Spark Ecosystem
      Chapter 2: Getting Started with Apache Spark2.2. Scala and PySpark Interfaces
      2.3. Spark Application Concepts2.4. Transformations and Actions in Apache Spark2.5. Lazy Evaluation in Apache Spark2.6. First Application in Spark2.7. Apache Spark Web UI
      Chapter 3: Spark Dataframe API
      Chapter 4: Spark Dataset API
      Chapter 5: Structured and Unstructured Data with Apache Spark5.1. Data Sources5.2. Generic Load/Save Functions5.3. Generic File Source Options5.4. Parquet Files5.5. ORC Files5.6. JSON Files5.7. CSV Files5.8. Text Files5.9. Hive Tables5.10. JDBC To Other Databases
      Chapter 6: Spark Machine Learning with MLlib
      Part II. Spark Data StreamingChapter 7: Introduction to Apache Spark Streaming7.1. Apache Spark Streaming’s Execution Model7.2. Stream Processing Architectures7.3. Architecture of Spark Streaming: Discretized Streams7.4. Benefits of Discretized Stream Processing7.4.1. Dynamic Load Balancing7.4.2. Fast Failure and Straggler Recovery
      Chapter 8: Structured Streaming8.1. Streaming Analytics8.2. Connecting to a Stream8.3. Preparing the Data in a Stream8.4. Operations on a Streaming Dataset
      Chapter 9: Structured Streaming Sources9.1. File Sources9.2. Apache Kafka Source9.3. A Rate Source
      Chapter 10: Structured Streaming Sinks10.1. Output Modes10.2. Output Sinks10.3. File Sink10.4. The Kafka Sink10.5. The Memory Sink 10.6. Streaming Table APIs10.7. Triggers10.8. Managing Streaming Queries10.9. Monitoring Streaming Queries10.9.1. Reading Metrics Interactively10.9.2. Reporting Metrics programmatically using Asynchronous APIs10.9.3. Reporting Metrics using Dropwizard10.9.4. Recovering from Failures with Checkpointing10.9.5. Recovery Semantics after Changes in a Streaming Query
      Chapter 11: Future Directions for Spark Streaming11.1. Backpressure11.2. Dynamic Scaling11.3. Event time and out-of-order data11.4. UI enhancements11.5. Continuous Processing
      Chapter 12: Watermarks. A deep survey of temporal progress metrics

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account