Description
Book SynopsisLeverage Apache Spark within a modern data engineering ecosystem.This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow. Apache Spark applications solve a wide range of data problems from traditional data loading and processing to rich SQL-based analysis as well as complex machine learning workloads and even near real-time processing of streaming data. Spark fits well as a central foundation for any data engineering workload. This book will teach you to write interactive Spark applications using Apache Zeppelin notebooks, write and compilereusable applications and modules, and fully test
Table of ContentsPart I. The Fundamentals of Data Engineering with Spark1. Introduction to Modern Data Engineering2. Getting Started with Apache Spark3. Working with Data4. Transforming Data with Spark SQL and the DataFrame API5. Bridging Spark SQL with JDBC6. Data Discovery and the Spark SQL Catalog7. Data Pipelines & Structured Spark Applications
Part II. The Streaming Pipeline Ecosystem8. Workflow Orchestration with Apache Airflow9. A Gentle Introduction to Stream Processing10. Patterns for Writing Structured Streaming Applications11. Apache Kafka & Spark Structured Streaming12. Analytical Processing & Insights
Part III. Advanced Techniques13. Advanced Analytics with Spark Stateful Structured Streaming14. Deploying Mission Critical Spark Applications on Spark Standalone15. Deploying Mission Critical Spark Applications on Kubernetes