Description

Book Synopsis
This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.This book covers Spark 3's new features, theoretical foundations, and application architecture. The first section introduces the Apache Spark ecosystem as a unified engine for large scale data analytics, and shows you how to run and fine-tune your first application in Spark. The second section centers on batch processing suited to end-of-cycle processing, and data ingestion through files and databases. It explains Spark DataFrame API as well as structured and unstructured data with Apache Spark. The last section deals with scalable, high-throughput, fault-tolerant streaming processing workloads to process real-time data. Here you'll learn about Apache Spark Streaming's execution model, the architecture of Spark S

Table of Contents
Part I. Apache Spark Batch Data ProcessingChapter 1: Introduction to Apache Spark for Large-Scale Data Analytics1.1. What is Apache Spark? 1.2. Spark Unified Analytics1.3. Batch vs Streaming Data1.4. Spark Ecosystem
Chapter 2: Getting Started with Apache Spark2.2. Scala and PySpark Interfaces
2.3. Spark Application Concepts2.4. Transformations and Actions in Apache Spark2.5. Lazy Evaluation in Apache Spark2.6. First Application in Spark2.7. Apache Spark Web UI
Chapter 3: Spark Dataframe API
Chapter 4: Spark Dataset API
Chapter 5: Structured and Unstructured Data with Apache Spark5.1. Data Sources5.2. Generic Load/Save Functions5.3. Generic File Source Options5.4. Parquet Files5.5. ORC Files5.6. JSON Files5.7. CSV Files5.8. Text Files5.9. Hive Tables5.10. JDBC To Other Databases
Chapter 6: Spark Machine Learning with MLlib
Part II. Spark Data StreamingChapter 7: Introduction to Apache Spark Streaming7.1. Apache Spark Streaming’s Execution Model7.2. Stream Processing Architectures7.3. Architecture of Spark Streaming: Discretized Streams7.4. Benefits of Discretized Stream Processing7.4.1. Dynamic Load Balancing7.4.2. Fast Failure and Straggler Recovery
Chapter 8: Structured Streaming8.1. Streaming Analytics8.2. Connecting to a Stream8.3. Preparing the Data in a Stream8.4. Operations on a Streaming Dataset
Chapter 9: Structured Streaming Sources9.1. File Sources9.2. Apache Kafka Source9.3. A Rate Source
Chapter 10: Structured Streaming Sinks10.1. Output Modes10.2. Output Sinks10.3. File Sink10.4. The Kafka Sink10.5. The Memory Sink 10.6. Streaming Table APIs10.7. Triggers10.8. Managing Streaming Queries10.9. Monitoring Streaming Queries10.9.1. Reading Metrics Interactively10.9.2. Reporting Metrics programmatically using Asynchronous APIs10.9.3. Reporting Metrics using Dropwizard10.9.4. Recovering from Failures with Checkpointing10.9.5. Recovery Semantics after Changes in a Streaming Query
Chapter 11: Future Directions for Spark Streaming11.1. Backpressure11.2. Dynamic Scaling11.3. Event time and out-of-order data11.4. UI enhancements11.5. Continuous Processing
Chapter 12: Watermarks. A deep survey of temporal progress metrics

Handson Guide to Apache Spark 3

Product form

£46.74

Includes FREE delivery

RRP £54.99 – you save £8.25 (15%)

Order before 4pm today for delivery by Mon 26 Jan 2026.

A Paperback by Alfonso Antolinez Garcia

1 in stock


    View other formats and editions of Handson Guide to Apache Spark 3 by Alfonso Antolinez Garcia

    Publisher: APress
    Publication Date: 1/6/2023 12:06:00 AM
    ISBN13: 9781484293799, 978-1484293799
    ISBN10: 1484293797

    Description

    Book Synopsis
    This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark's structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows.This book covers Spark 3's new features, theoretical foundations, and application architecture. The first section introduces the Apache Spark ecosystem as a unified engine for large scale data analytics, and shows you how to run and fine-tune your first application in Spark. The second section centers on batch processing suited to end-of-cycle processing, and data ingestion through files and databases. It explains Spark DataFrame API as well as structured and unstructured data with Apache Spark. The last section deals with scalable, high-throughput, fault-tolerant streaming processing workloads to process real-time data. Here you'll learn about Apache Spark Streaming's execution model, the architecture of Spark S

    Table of Contents
    Part I. Apache Spark Batch Data ProcessingChapter 1: Introduction to Apache Spark for Large-Scale Data Analytics1.1. What is Apache Spark? 1.2. Spark Unified Analytics1.3. Batch vs Streaming Data1.4. Spark Ecosystem
    Chapter 2: Getting Started with Apache Spark2.2. Scala and PySpark Interfaces
    2.3. Spark Application Concepts2.4. Transformations and Actions in Apache Spark2.5. Lazy Evaluation in Apache Spark2.6. First Application in Spark2.7. Apache Spark Web UI
    Chapter 3: Spark Dataframe API
    Chapter 4: Spark Dataset API
    Chapter 5: Structured and Unstructured Data with Apache Spark5.1. Data Sources5.2. Generic Load/Save Functions5.3. Generic File Source Options5.4. Parquet Files5.5. ORC Files5.6. JSON Files5.7. CSV Files5.8. Text Files5.9. Hive Tables5.10. JDBC To Other Databases
    Chapter 6: Spark Machine Learning with MLlib
    Part II. Spark Data StreamingChapter 7: Introduction to Apache Spark Streaming7.1. Apache Spark Streaming’s Execution Model7.2. Stream Processing Architectures7.3. Architecture of Spark Streaming: Discretized Streams7.4. Benefits of Discretized Stream Processing7.4.1. Dynamic Load Balancing7.4.2. Fast Failure and Straggler Recovery
    Chapter 8: Structured Streaming8.1. Streaming Analytics8.2. Connecting to a Stream8.3. Preparing the Data in a Stream8.4. Operations on a Streaming Dataset
    Chapter 9: Structured Streaming Sources9.1. File Sources9.2. Apache Kafka Source9.3. A Rate Source
    Chapter 10: Structured Streaming Sinks10.1. Output Modes10.2. Output Sinks10.3. File Sink10.4. The Kafka Sink10.5. The Memory Sink 10.6. Streaming Table APIs10.7. Triggers10.8. Managing Streaming Queries10.9. Monitoring Streaming Queries10.9.1. Reading Metrics Interactively10.9.2. Reporting Metrics programmatically using Asynchronous APIs10.9.3. Reporting Metrics using Dropwizard10.9.4. Recovering from Failures with Checkpointing10.9.5. Recovery Semantics after Changes in a Streaming Query
    Chapter 11: Future Directions for Spark Streaming11.1. Backpressure11.2. Dynamic Scaling11.3. Event time and out-of-order data11.4. UI enhancements11.5. Continuous Processing
    Chapter 12: Watermarks. A deep survey of temporal progress metrics

    Recently viewed products

    © 2026 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account