Description
Book SynopsisDesign and implement a modern data lakehouse on the Azure Data Platform using Delta Lake, Apache Spark, Azure Databricks, Azure Synapse Analytics, and Snowflake. This book teaches you the intricate details of the Data Lakehouse Paradigm and how to efficiently design a cloud-based data lakehouse using highly performant and cutting-edge Apache Spark capabilities using Azure Databricks, Azure Synapse Analytics, and Snowflake. You will learn to write efficient PySpark code for batch and streaming ELT jobs on Azure. And you will follow along with practical, scenario-based examples showing how to apply the capabilities of Delta Lake and Apache Spark to optimize performance, and secure, share, and manage a high volume, high velocity, and high variety of data in your lakehouse with ease. The patterns of success that you acquire from reading this book will help you hone your skills to build high-performing and scalable ACID-compliant lakehouses using flexible and cost-efficient decoupled stor
Table of ContentsIntroduction
Part I. Getting Started1. The Lakehouse Paradigm2. Mount Lakes to Databricks
Part II. Lakehouse Platforms3. Snowflake Data Warehouse4. Synapse Analytics Serverless Pools5. Databricks SQL Analytics
Part III. Apache Spark6. PySpark7. Extract, Load, Transform Jobs
Part IV. Delta Lake8. Delta Schema Evolution9. Delta Change Feed10. Delta Clones11. Delta Live Tables12. Delta Sharing
Part V. Optimizing Performance13. Dynamic Partition Pruning for Querying Star Schemas14. Z-Ordering and Data Skipping15. Adaptive Query Execution16. Bloom Filter Index17. Hyperspace
Part VI. Lakehouse Capabilities18. Auto Loader Resource Management19. Advanced Schema Evolution with Auto Loader 20. Python Wheels21. Security and Controls22. Unity Catalog