Description

Book Synopsis


Table of Contents

Introduction xxi

Assessment Test xxx

Chapter 1 History of Analytics and Big Data 1

Evolution of Analytics Architecture Over the Years 3

The New World Order 5

Analytics Pipeline 6

Data Sources 7

Collection 8

Storage 8

Processing and Analysis 9

Visualization, Predictive and Prescriptive Analytics 9

The Big Data Reference Architecture 10

Data Characteristics: Hot, Warm, and Cold 11

Collection/Ingest 12

Storage 13

Process/Analyze 14

Consumption 15

Data Lakes and Their Relevance in Analytics 16

What is a Data Lake? 16

Building a Data Lake on AWS 19

Step 1: Choosing the Right Storage – Amazon S3 is the Base 19

Step 2: Data Ingestion – Moving the Data into the Data Lake 21

Step 3: Cleanse, Prep, and Catalog the Data 22

Step 4: Secure the Data and Metadata 23

Step 5: Make Data Available for Analytics 23

Using Lake Formation to Build a Data Lake on AWS 23

Exam Objectives 24

Objective Map 25

Assessment Test 27

References 29

Chapter 2 Data Collection 31

Exam Objectives 32

AWS IoT 33

Common Use Cases for AWS IoT 35

How AWS IoT Works 36

Amazon Kinesis 38

Amazon Kinesis Introduction 40

Amazon Kinesis Data Streams 40

Amazon Kinesis Data Analytics 54

Amazon Kinesis Video Streams 61

AWS Glue 64

Glue Data Catalog 66

Glue Crawlers 68

Authoring ETL Jobs 69

Executing ETL Jobs 71

Change Data Capture with Glue Bookmarks 71

Use Cases for AWS Glue 72

Amazon SQS 72

Amazon Data Migration Service 74

What is AWS DMS Anyway? 74

What Does AWS DMS Support? 75

AWS Data Pipeline 77

Pipeline Definition 77

Pipeline Schedules 78

Task Runner 79

Large-Scale Data Transfer Solutions 81

AWS Snowcone 81

AWS Snowball 82

AWS Snowmobile 85

AWS Direct Connect 86

Summary 87

Review Questions 88

References 90

Exercises & Workshops 91

Chapter 3 Data Storage 93

Introduction 94

Amazon S3 95

Amazon S3 Data Consistency Model 96

Data Lake and S3 97

Data Replication in Amazon S3 100

Server Access Logging in Amazon S3 101

Partitioning, Compression, and File Formats on S3 101

Amazon S3 Glacier 103

Vault 103

Archive 104

Amazon DynamoDB 104

Amazon DynamoDB Data Types 105

Amazon DynamoDB Core Concepts 108

Read/Write Capacity Mode in DynamoDB 108

DynamoDB Auto Scaling and Reserved Capacity 111

Read Consistency and Global Tables 111

Amazon DynamoDB: Indexing and Partitioning 113

Amazon DynamoDB Accelerator 114

Amazon DynamoDB Streams 115

Amazon DynamoDB Streams – Kinesis Adapter 116

Amazon DocumentDB 117

Why a Document Database? 117

Amazon DocumentDB Overview 119

Amazon Document DB Architecture 120

Amazon DocumentDB Interfaces 120

Graph Databases and Amazon Neptune 121

Amazon Neptune Overview 122

Amazon Neptune Use Cases 123

Storage Gateway 123

Hybrid Storage Requirements 123

AWS Storage Gateway 125

Amazon EFS 127

Amazon EFS Use Cases 130

Interacting with Amazon EFS 132

Amazon EFS Security Model 132

Backing Up Amazon EFS 132

Amazon FSx for Lustre 133

Key Benefits of Amazon FSx for Lustre 134

Use Cases for Lustre 135

AWS Transfer for SFTP 135

Summary 136

Exercises 137

Review Questions 140

Further Reading 142

References 142

Chapter 4 Data Processing and Analysis 143

Introduction 144

Types of Analytical Workloads 144

Amazon Athena 146

Apache Presto 147

Apache Hive 148

Amazon Athena Use Cases and Workloads 149

Amazon Athena DDL, DML, and DCL 150

Amazon Athena Workgroups 151

Amazon Athena Federated Query 153

Amazon Athena Custom UDFs 154

Using Machine Learning with Amazon Athena 154

Amazon EMR 155

Apache Hadoop Overview 156

Amazon EMR Overview 157

Apache Hadoop on Amazon EMR 158

EMRFS 166

Bootstrap Actions and Custom AMI 167

Security on EMR 167

EMR Notebooks 168

Apache Hive and Apache Pig on Amazon EMR 169

Apache Spark on Amazon EMR 174

Apache HBase on Amazon EMR 182

Apache Flink, Apache Mahout, and Apache MXNet 184

Choosing the Right Analytics Tool 186

Amazon Elasticsearch Service 188

When to Use Elasticsearch 188

Elasticsearch Core Concepts (the ELK Stack) 189

Amazon Elasticsearch Service 191

Amazon Redshift 192

What is Data Warehousing? 192

What is Redshift? 193

Redshift Architecture 195

Redshift AQUA 198

Redshift Scalability 199

Data Modeling in Redshift 205

Data Loading and Unloading 213

Query Optimization in Redshift 217

Security in Redshift 221

Kinesis Data Analytics 225

How Does It Work? 226

What is Kinesis Data Analytics for Java? 228

Comparing Batch Processing Services 229

Comparing Orchestration Options on AWS 230

AWS Step Functions 230

Comparing Different ETL Orchestration Options 230

Summary 231

Exam Essentials 232

Exercises 232

Review Questions 235

References 237

Recommended Workshops 237

Amazon Athena Blogs 238

Amazon Redshift Blogs 240

Amazon EMR Blogs 241

Amazon Elasticsearch Blog 241

Amazon Redshift References and Further Reading 242

Chapter 5 Data Visualization 243

Introduction 244

Data Consumers 245

Data Visualization Options 246

Amazon QuickSight 247

Getting Started 248

Working with Data 250

Data Preparation 255

Data Analysis 256

Data Visualization 258

Machine Learning Insights 261

Building Dashboards 262

Embedding QuickSight Objects into Other Applications 264

Administration 265

Security 266

Other Visualization Options 267

Predictive Analytics 270

What is Predictive Analytics? 270

The AWS ML Stack 271

Summary 273

Exam Essentials 273

Exercises 274

Review Questions 275

References 276

Additional Reading Material 276

Chapter 6 Data Security 279

Introduction 280

Shared Responsibility Model 280

Security Services on AWS 282

AWS IAM Overview 285

IAM User 285

IAM Groups 286

IAM Roles 287

Amazon EMR Security 289

Public Subnet 290

Private Subnet 291

Security Configurations 293

Block Public Access 298

VPC Subnets 298

Security Options during Cluster Creation 299

EMR Security Summary 300

Amazon S3 Security 301

Managing Access to Data in Amazon S3 301

Data Protection in Amazon S3 305

Logging and Monitoring with Amazon S3 306

Best Practices for Security on Amazon S3 308

Amazon Athena Security 308

Managing Access to Amazon Athena 309

Data Protection in Amazon Athena 310

Data Encryption in Amazon Athena 311

Amazon Athena and AWS Lake Formation 312

Amazon Redshift Security 312

Levels of Security within Amazon Redshift 313

Data Protection in Amazon Redshift 315

Redshift Auditing 316

Redshift Logging 317

Amazon Elasticsearch Security 317

Elasticsearch Network Configuration 318

VPC Access 318

Accessing Amazon Elasticsearch and Kibana 319

Data Protection in Amazon Elasticsearch 322

Amazon Kinesis Security 325

Managing Access to Amazon Kinesis 325

Data Protection in Amazon Kinesis 326

Amazon Kinesis Best Practices 326

Amazon QuickSight Security 327

Managing Data Access with Amazon QuickSight 327

Data Protection 328

Logging and Monitoring 329

Security Best Practices 329

Amazon DynamoDB Security 329

Access Management in DynamoDB 329

IAM Policy with Fine-Grained Access Control 330

Identity Federation 331

How to Access Amazon DynamoDB 332

Data Protection with DynamoDB 332

Monitoring and Logging with DynamoDB 333

Summary 334

Exam Essentials 334

Exercises/Workshops 334

Review Questions 336

References and Further Reading 337

Appendix Answers to Review Questions 339

Chapter 1: History of Analytics and Big Data 340

Chapter 2: Data Collection 342

Chapter 3: Data Storage 343

Chapter 4: Data Processing and Analysis 344

Chapter 5: Data Visualization 346

Chapter 6: Data Security 346

Index 349

AWS Certified Data Analytics Study Guide with

    Product form

    £92.00

    Includes FREE delivery

    RRP £115.00 – you save £23.00 (20%)

    Order before 4pm tomorrow for delivery by Wed 15 Jul 2026.

    A Paperback / softback by Asif Abbasi

    10 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of AWS Certified Data Analytics Study Guide with by Asif Abbasi

      Publisher: John Wiley & Sons Inc
      Publication Date: 24/06/2021
      ISBN13: 9781119819455, 978-1119819455
      ISBN10: 1119819458

      Description

      Book Synopsis


      Table of Contents

      Introduction xxi

      Assessment Test xxx

      Chapter 1 History of Analytics and Big Data 1

      Evolution of Analytics Architecture Over the Years 3

      The New World Order 5

      Analytics Pipeline 6

      Data Sources 7

      Collection 8

      Storage 8

      Processing and Analysis 9

      Visualization, Predictive and Prescriptive Analytics 9

      The Big Data Reference Architecture 10

      Data Characteristics: Hot, Warm, and Cold 11

      Collection/Ingest 12

      Storage 13

      Process/Analyze 14

      Consumption 15

      Data Lakes and Their Relevance in Analytics 16

      What is a Data Lake? 16

      Building a Data Lake on AWS 19

      Step 1: Choosing the Right Storage – Amazon S3 is the Base 19

      Step 2: Data Ingestion – Moving the Data into the Data Lake 21

      Step 3: Cleanse, Prep, and Catalog the Data 22

      Step 4: Secure the Data and Metadata 23

      Step 5: Make Data Available for Analytics 23

      Using Lake Formation to Build a Data Lake on AWS 23

      Exam Objectives 24

      Objective Map 25

      Assessment Test 27

      References 29

      Chapter 2 Data Collection 31

      Exam Objectives 32

      AWS IoT 33

      Common Use Cases for AWS IoT 35

      How AWS IoT Works 36

      Amazon Kinesis 38

      Amazon Kinesis Introduction 40

      Amazon Kinesis Data Streams 40

      Amazon Kinesis Data Analytics 54

      Amazon Kinesis Video Streams 61

      AWS Glue 64

      Glue Data Catalog 66

      Glue Crawlers 68

      Authoring ETL Jobs 69

      Executing ETL Jobs 71

      Change Data Capture with Glue Bookmarks 71

      Use Cases for AWS Glue 72

      Amazon SQS 72

      Amazon Data Migration Service 74

      What is AWS DMS Anyway? 74

      What Does AWS DMS Support? 75

      AWS Data Pipeline 77

      Pipeline Definition 77

      Pipeline Schedules 78

      Task Runner 79

      Large-Scale Data Transfer Solutions 81

      AWS Snowcone 81

      AWS Snowball 82

      AWS Snowmobile 85

      AWS Direct Connect 86

      Summary 87

      Review Questions 88

      References 90

      Exercises & Workshops 91

      Chapter 3 Data Storage 93

      Introduction 94

      Amazon S3 95

      Amazon S3 Data Consistency Model 96

      Data Lake and S3 97

      Data Replication in Amazon S3 100

      Server Access Logging in Amazon S3 101

      Partitioning, Compression, and File Formats on S3 101

      Amazon S3 Glacier 103

      Vault 103

      Archive 104

      Amazon DynamoDB 104

      Amazon DynamoDB Data Types 105

      Amazon DynamoDB Core Concepts 108

      Read/Write Capacity Mode in DynamoDB 108

      DynamoDB Auto Scaling and Reserved Capacity 111

      Read Consistency and Global Tables 111

      Amazon DynamoDB: Indexing and Partitioning 113

      Amazon DynamoDB Accelerator 114

      Amazon DynamoDB Streams 115

      Amazon DynamoDB Streams – Kinesis Adapter 116

      Amazon DocumentDB 117

      Why a Document Database? 117

      Amazon DocumentDB Overview 119

      Amazon Document DB Architecture 120

      Amazon DocumentDB Interfaces 120

      Graph Databases and Amazon Neptune 121

      Amazon Neptune Overview 122

      Amazon Neptune Use Cases 123

      Storage Gateway 123

      Hybrid Storage Requirements 123

      AWS Storage Gateway 125

      Amazon EFS 127

      Amazon EFS Use Cases 130

      Interacting with Amazon EFS 132

      Amazon EFS Security Model 132

      Backing Up Amazon EFS 132

      Amazon FSx for Lustre 133

      Key Benefits of Amazon FSx for Lustre 134

      Use Cases for Lustre 135

      AWS Transfer for SFTP 135

      Summary 136

      Exercises 137

      Review Questions 140

      Further Reading 142

      References 142

      Chapter 4 Data Processing and Analysis 143

      Introduction 144

      Types of Analytical Workloads 144

      Amazon Athena 146

      Apache Presto 147

      Apache Hive 148

      Amazon Athena Use Cases and Workloads 149

      Amazon Athena DDL, DML, and DCL 150

      Amazon Athena Workgroups 151

      Amazon Athena Federated Query 153

      Amazon Athena Custom UDFs 154

      Using Machine Learning with Amazon Athena 154

      Amazon EMR 155

      Apache Hadoop Overview 156

      Amazon EMR Overview 157

      Apache Hadoop on Amazon EMR 158

      EMRFS 166

      Bootstrap Actions and Custom AMI 167

      Security on EMR 167

      EMR Notebooks 168

      Apache Hive and Apache Pig on Amazon EMR 169

      Apache Spark on Amazon EMR 174

      Apache HBase on Amazon EMR 182

      Apache Flink, Apache Mahout, and Apache MXNet 184

      Choosing the Right Analytics Tool 186

      Amazon Elasticsearch Service 188

      When to Use Elasticsearch 188

      Elasticsearch Core Concepts (the ELK Stack) 189

      Amazon Elasticsearch Service 191

      Amazon Redshift 192

      What is Data Warehousing? 192

      What is Redshift? 193

      Redshift Architecture 195

      Redshift AQUA 198

      Redshift Scalability 199

      Data Modeling in Redshift 205

      Data Loading and Unloading 213

      Query Optimization in Redshift 217

      Security in Redshift 221

      Kinesis Data Analytics 225

      How Does It Work? 226

      What is Kinesis Data Analytics for Java? 228

      Comparing Batch Processing Services 229

      Comparing Orchestration Options on AWS 230

      AWS Step Functions 230

      Comparing Different ETL Orchestration Options 230

      Summary 231

      Exam Essentials 232

      Exercises 232

      Review Questions 235

      References 237

      Recommended Workshops 237

      Amazon Athena Blogs 238

      Amazon Redshift Blogs 240

      Amazon EMR Blogs 241

      Amazon Elasticsearch Blog 241

      Amazon Redshift References and Further Reading 242

      Chapter 5 Data Visualization 243

      Introduction 244

      Data Consumers 245

      Data Visualization Options 246

      Amazon QuickSight 247

      Getting Started 248

      Working with Data 250

      Data Preparation 255

      Data Analysis 256

      Data Visualization 258

      Machine Learning Insights 261

      Building Dashboards 262

      Embedding QuickSight Objects into Other Applications 264

      Administration 265

      Security 266

      Other Visualization Options 267

      Predictive Analytics 270

      What is Predictive Analytics? 270

      The AWS ML Stack 271

      Summary 273

      Exam Essentials 273

      Exercises 274

      Review Questions 275

      References 276

      Additional Reading Material 276

      Chapter 6 Data Security 279

      Introduction 280

      Shared Responsibility Model 280

      Security Services on AWS 282

      AWS IAM Overview 285

      IAM User 285

      IAM Groups 286

      IAM Roles 287

      Amazon EMR Security 289

      Public Subnet 290

      Private Subnet 291

      Security Configurations 293

      Block Public Access 298

      VPC Subnets 298

      Security Options during Cluster Creation 299

      EMR Security Summary 300

      Amazon S3 Security 301

      Managing Access to Data in Amazon S3 301

      Data Protection in Amazon S3 305

      Logging and Monitoring with Amazon S3 306

      Best Practices for Security on Amazon S3 308

      Amazon Athena Security 308

      Managing Access to Amazon Athena 309

      Data Protection in Amazon Athena 310

      Data Encryption in Amazon Athena 311

      Amazon Athena and AWS Lake Formation 312

      Amazon Redshift Security 312

      Levels of Security within Amazon Redshift 313

      Data Protection in Amazon Redshift 315

      Redshift Auditing 316

      Redshift Logging 317

      Amazon Elasticsearch Security 317

      Elasticsearch Network Configuration 318

      VPC Access 318

      Accessing Amazon Elasticsearch and Kibana 319

      Data Protection in Amazon Elasticsearch 322

      Amazon Kinesis Security 325

      Managing Access to Amazon Kinesis 325

      Data Protection in Amazon Kinesis 326

      Amazon Kinesis Best Practices 326

      Amazon QuickSight Security 327

      Managing Data Access with Amazon QuickSight 327

      Data Protection 328

      Logging and Monitoring 329

      Security Best Practices 329

      Amazon DynamoDB Security 329

      Access Management in DynamoDB 329

      IAM Policy with Fine-Grained Access Control 330

      Identity Federation 331

      How to Access Amazon DynamoDB 332

      Data Protection with DynamoDB 332

      Monitoring and Logging with DynamoDB 333

      Summary 334

      Exam Essentials 334

      Exercises/Workshops 334

      Review Questions 336

      References and Further Reading 337

      Appendix Answers to Review Questions 339

      Chapter 1: History of Analytics and Big Data 340

      Chapter 2: Data Collection 342

      Chapter 3: Data Storage 343

      Chapter 4: Data Processing and Analysis 344

      Chapter 5: Data Visualization 346

      Chapter 6: Data Security 346

      Index 349

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account