Description

Book Synopsis

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud

In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analyticsfrom data engineering to analysis, business intelligence, DevOps, and MLOpsas you discover how to integrate machine learning predictions with analytics engines and visualization tools.

You'll also find:

  • Real-world use cases of AWS architectures that demystify the applications of data analytics
  • Accessible introductions to data acquisition, importation, storage, visualization, and reporting
  • Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify m

    Table of Contents

    Introduction xxiii

    Chapter 1 AWS Data Lakes and Analytics Technology Overview 1

    Why AWS? 1

    What Does a Data Lake Look Like in AWS? 2

    Analytics on AWS 3

    Skills Required to Build and Maintain an AWS Analytics Pipeline 3

    Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team 5

    The Data Vision 6

    Support 6

    DA Team Roles 7

    Early Stage Roles 7

    Team Lead 8

    Data Architect 8

    Data Engineer 8

    Data Analyst 9

    Maturity Stage Roles 9

    Data Scientist 9

    Cloud Engineer 10

    Business Intelligence (BI) Developer 10

    Machine Learning Engineer 10

    Business Analyst 11

    Niche Roles 11

    Analytics Flow at a Process Level 12

    Workflow Methodology 12

    The DA Team Mantra: “Automate Everything” 14

    Analytics Models in the Wild: Centralized, Distributed, Center of Excellence 15

    Centralized 15

    Distributed 16

    Center of Excellence 16

    Summary 17

    Chapter 3 Working on AWS 19

    Accessing AWS 20

    Everything Is a Resource 21

    S3: An Important Exception 21

    IAM: Policies, Roles, and Users 22

    Policies 22

    Identity- Based Policies 24

    Resource- Based Policies 25

    Roles 25

    Users and User Groups 25

    Summarizing IAM 26

    Working with the Web Console 26

    The AWS Command- Line Interface 29

    Installing AWS cli 29

    Linux Installation 30

    macOS Installation 30

    Windows 31

    Configuring AWS cli 31

    A Note on Region 33

    Setting Individual Parameters 33

    Using Profiles and Configuration Files 33

    Final Notes on Configuration 36

    Using the AWS cli 36

    Using Skeletons and File Inputs 39

    Cleaning Up! 43

    Infrastructure- as- Code: CloudFormation and Terraform 44

    CloudFormation 44

    CloudFormation Stacks 46

    CloudFormation Template Anatomy 47

    CloudFormation Changesets 52

    Getting Stack Information 55

    Cleaning Up Again 57

    CloudFormation Conclusions 58

    Terraform 58

    Coding Style 58

    Modularity 59

    Limitations 59

    Terraform vs. CloudFormation 60

    Infrastructure- as- Code: CDK, Pulumi, Cloudcraft, and Other Solutions 60

    AWS CDK 60

    Pulumi 62

    Cloudcraft 62

    Infrastructure Management Conclusions 63

    Chapter 4 Serverless Computing and Data Engineering 65

    Serverless vs. Fully Managed 65

    AWS Serverless Technologies 66

    AWS Lambda 67

    Pricing Model 67

    Laser Focus on Code 68

    The Lambda Paradigm Shift 69

    Virtually Infinite Scalability 70

    Geographical Distribution 70

    A Lambda Hello World 71

    Lambda Configuration 74

    Runtime 74

    Container- Based Lambdas 75

    Architectures 75

    Memory 75

    Networking 76

    Execution Role 76

    Environment Variables 76

    AWS EventBridge 77

    AWS Fargate 77

    AWS DynamoDB 77

    AWS SNS 77

    Amazon SQS 78

    AWS CloudWatch 78

    Amazon QuickSight 78

    AWS Step Functions 78

    Amazon API Gateway 79

    Amazon Cognito 79

    AWS Serverless Application Model (SAM) 79

    Ephemeral Infrastructure 80

    AWS SAM Installation 80

    Configuration 80

    Creating Your First AWS SAM Project 81

    Application Structure 83

    SAM Resource Types 85

    SAM Lambda Template 86

    !! Recursive Lambda Invocation !! 88

    Function Metadata 88

    Outputs 89

    Implicitly Generated Resources 89

    Other Template Sections 90

    Lambda Code 90

    Building Your First SAM Application 93

    Testing the AWS SAM Application Locally 96

    Deployment 99

    Cleaning Up 104

    Summary 104

    Chapter 5 Data Ingestion 105

    AWS Data Lake Architecture 106

    Serverless Data Lake Architecture Structure 106

    Ingestion 106

    Storage and Processing 108

    Cataloging, Governance, and Search 108

    Security and Monitoring 109

    Consumption 109

    Sample Processing Architecture: Cataloging Images into DynamoDB 109

    Use Case Description 109

    SAM Application Creation 110

    S3- Triggered Lambda 111

    Adding DynamoDB 119

    Lambda Execution Context 121

    Inserting into DynamoDB 121

    Cleaning Up 123

    Serverless Ingestion 124

    AWS Fargate 124

    AWS Lambda 124

    Example Architecture: Fargate- Based Periodic Batch Import 125

    The Basic Importer 125

    ECS CLI 128

    AWS Copilot cli 128

    Clean Up 136

    AWS Kinesis Ingestion 136

    Example Architecture: Two- Pronged Delivery 137

    Fully Managed Ingestion with AppFlow 146

    Operational Data Ingestion with Database Migration Service 151

    DMS Concepts 151

    DMS Instance 151

    DMS Endpoints 152

    DMS Tasks 152

    Summary of the Workflow 152

    Common Use of DMS 153

    Example Architecture: DMS to S3 154

    DMS Instance 154

    DMS Endpoints 156

    DMS Task 162

    Summary 167

    Chapter 6 Processing Data 169

    Phases of Data Preparation 170

    What Is ETL? Why Should I Care? 170

    ETL Job vs. Streaming Job 171

    Overview of ETL in AWS 172

    ETL with AWS Glue 172

    ETL with Lambda Functions 172

    ETL with Hadoop/EMR 173

    Other Ways to Perform ETL 173

    ETL Job Design Concepts 173

    Source Identification 174

    Destination Identification 174

    Mappings 174

    Validation 174

    Filter 175

    Join, Denormalization, Relationalization 175

    AWS Glue for ETL 176

    Really, It’s Just Spark 176

    Visual 176

    Spark Script Editor 177

    Python Shell Script Editor 177

    Jupyter Notebook 177

    Connectors 177

    Creating Connections 178

    Creating Connections with the Web Console 178

    Creating Connections with the AWS cli 179

    Creating ETL Jobs with AWS Glue Visual Editor 184

    ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) 184

    Job Bookmarks 187

    Transformations 188

    Apply Mapping 189

    Filter 189

    Other Available Transforms 190

    Run the Edited Job 191

    Visual Editor with Source and Target Conclusions 192

    Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) 192

    Creating ETL Jobs with the Spark Script Editor 192

    Developing ETL Jobs with AWS Glue Notebooks 193

    What Is a Notebook? 194

    Notebook Structure 194

    Step 1: Load Code into a DynamicFrame 196

    Step 2: Apply Field Mapping 197

    Step 3: Apply the Filter 197

    Step 4: Write to S3 in Parquet Format 198

    Example: Joining and Denormalizing Data from Two S3 Locations 199

    Conclusions for Manually Authored Jobs with Notebooks 203

    Creating ETL Jobs with AWS Glue Interactive Sessions 204

    It’s Magic 205

    Development Workflow 206

    Streaming Jobs 207

    Differences with a Standard ETL Job 208

    Streaming Sources 208

    Example: Process Kinesis Streams with a Streaming Job 208

    Streaming ETL Jobs Conclusions 217

    Summary 217

    Chapter 7 Cataloging, Governance, and Search 219

    Cataloging with AWS Glue 219

    AWS Glue and the AWS Glue Data Catalog 219

    Glue Databases and Tables 220

    Databases 220

    The Idea of Schema- on- Read 221

    Tables 222

    Create Table Manually 223

    Creating a Table from an Existing Schema 225

    Creating a Table with a Crawler 225

    Summary on Databases and Tables 226

    Crawlers 226

    Updating or Not Updating? 230

    Running the Crawler 231

    Creating a Crawler from the AWS CLI 231

    Retrieving Table Information from the CLI 233

    Classifiers 235

    Classifier Example 236

    Crawlers and Classifiers Summary 237

    Search with Amazon Athena: The Heart of Analytics in AWS 238

    A Bit of History 238

    Interface Overview 238

    Creating Tables Manually 239

    Athena Data Types 240

    Complex Types 241

    Running a Query 242

    Connecting with JDBC and ODBC 243

    Query Stats 243

    Recent Queries and Saved Queries 243

    The Power of Partitions 244

    Athena Pricing Model 244

    Automatic Naming 245

    Athena Query Output 246

    Athena Peculiarities (SQL and Not) 246

    Computed Fields Gotcha and WITH Statement Workaround 246

    Lowercase! 247

    Query Explain 248

    Deduplicating Records 249

    Working with JSON, Flattening, and Unnesting 250

    Athena Views 251

    Create Table as Select (CTAS) 252

    Saving Queries and Reusing Saved Queries 253

    Running Parameterized Queries 254

    Athena Federated Queries 254

    Athena Lambda Connectors 255

    Note on Connection Errors 256

    Performing Federated Queries 257

    Creating a View from a Federated Query 258

    Governing: Athena Workgroups, Lake Formation, and More 258

    Athena Workgroups 259

    Fine- Grained Athena Access with IAM 262

    Recap of Athena- Based Governance 264

    AWS Lake Formation 265

    Registering a Location in Lake Formation 266

    Creating a Database in Lake Formation 268

    Assigning Permissions in Lake Formation 269

    LF- Tags and Permissions in Lake Formation 271

    Data Filters 277

    Governance Conclusions 279

    Summary 280

    Chapter 8 Data Consumption: BI, Visualization, and Reporting 283

    QuickSight 283

    Signing Up for QuickSight 284

    Standard Plan 284

    Enterprise Plan 284

    Users and User Groups 285

    Managing Users and Groups 285

    Managing QuickSight 286

    Users and Groups 287

    Your Subscriptions 287

    SPICE Capacity 287

    Account Settings 287

    Security and Permissions 287

    VPC Connections 288

    Mobile Settings 289

    Domains and Embedding 289

    Single Sign- On 289

    Data Sources and Datasets 289

    Creating an Athena Data Source 291

    Creating Other Data Sources 292

    Creating a Data Source from the AWS cli 292

    Creating a Dataset from a Table 294

    Creating a Dataset from a SQL Query 295

    Duplicating Datasets 296

    Note on Creating Datasets 297

    QuickSight Favorites, Recent, and Folders 297

    SPICE 298

    Manage SPICE Capacity 298

    Refresh Schedule 299

    QuickSight Data Editor 299

    QuickSight Data Types 302

    Change Data Types 302

    Calculated Fields 303

    Joining Data 305

    Excluding Fields 309

    Filtering Data 309

    Removing Data 310

    Geospatial Hierarchies and Adding Fields to Hierarchies 310

    Unsupported Format Dates 311

    Visualizing Data: QuickSight Analysis 312

    Adding a Title and a Description to Your Analysis 313

    Renaming the Sheet 314

    Your First Visual with AutoGraph 314

    Field Wells 314

    Visuals Types 315

    Saving and Autosaving 316

    A First Example: Pie Chart 316

    Renaming a Visual 317

    Filtering Data 318

    Adding Drill- Downs 320

    Parameters 321

    Actions 324

    Insights 328

    ML- Powered Insights 330

    Sharing an Analysis 335

    Dashboards 335

    Dashboard Layouts and Themes 335

    Publishing a Dashboard 336

    Embedding Visuals and Dashboards 337

    Data Consumption: Not Only Dashboards 337

    Summary 338

    Chapter 9 Machine Learning at Scale 339

    Machine Learning and Artificial Intelligence 339

    What Are ML/AI Use Cases? 340

    Types of ML Models 340

    Overview of ML/AI AWS Solutions 341

    Amazon SageMaker 341

    SageMaker Domains 342

    Adding a User to the Domain 344

    SageMaker Studio 344

    SageMaker Example Notebook 346

    Step 1: Prerequisites and Preprocessing 346

    Step 2: Data Ingestion 347

    Step 3: Data Inspection 348

    Step 4: Data Conversion 349

    Step 5: Upload Training Data 349

    Step 6: Train the Model 349

    Step 7: Set Up Hosting and Deploy the Model 351

    Step 8: Validate the Model 352

    Step 9: Use the Model 353

    Inference 353

    Real Time 354

    Asynchronous 354

    Serverless 354

    Batch Transform 354

    Data Wrangler 356

    SageMaker Canvas 357

    Summary 358

    Appendix Example Data Architectures in AWS 359

    Modern Data Lake Architecture 360

    ETL in a Lake House 361

    Consuming Data in the Lake House 361

    The Modern Data Lake Architecture 362

    Batch Processing 362

    Stream Processing 363

    Architecture Design Recommendations 364

    Automate Everything 365

    Build on Events 365

    Performance = Cost Savings 365

    AWS Glue Catalog and Athena- Centric Workflow 365

    Design Flexible 365

    Pick Your Battles 365

    Parquet 366

    Summary 366

    Index 367

Data Analytics in the AWS Cloud

    Product form

    £40.38

    Includes FREE delivery

    RRP £47.50 – you save £7.12 (14%)

    Order before 4pm today for delivery by Mon 13 Jul 2026.

    A Paperback / softback by Joe Minichino

    10 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Data Analytics in the AWS Cloud by Joe Minichino

      Publisher: John Wiley & Sons Inc
      Publication Date: 18/05/2023
      ISBN13: 9781119909248, 978-1119909248
      ISBN10: 1119909244
      Also in:
      Data mining

      Description

      Book Synopsis

      A comprehensive and accessible roadmap to performing data analytics in the AWS cloud

      In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analyticsfrom data engineering to analysis, business intelligence, DevOps, and MLOpsas you discover how to integrate machine learning predictions with analytics engines and visualization tools.

      You'll also find:

      • Real-world use cases of AWS architectures that demystify the applications of data analytics
      • Accessible introductions to data acquisition, importation, storage, visualization, and reporting
      • Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify m

        Table of Contents

        Introduction xxiii

        Chapter 1 AWS Data Lakes and Analytics Technology Overview 1

        Why AWS? 1

        What Does a Data Lake Look Like in AWS? 2

        Analytics on AWS 3

        Skills Required to Build and Maintain an AWS Analytics Pipeline 3

        Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team 5

        The Data Vision 6

        Support 6

        DA Team Roles 7

        Early Stage Roles 7

        Team Lead 8

        Data Architect 8

        Data Engineer 8

        Data Analyst 9

        Maturity Stage Roles 9

        Data Scientist 9

        Cloud Engineer 10

        Business Intelligence (BI) Developer 10

        Machine Learning Engineer 10

        Business Analyst 11

        Niche Roles 11

        Analytics Flow at a Process Level 12

        Workflow Methodology 12

        The DA Team Mantra: “Automate Everything” 14

        Analytics Models in the Wild: Centralized, Distributed, Center of Excellence 15

        Centralized 15

        Distributed 16

        Center of Excellence 16

        Summary 17

        Chapter 3 Working on AWS 19

        Accessing AWS 20

        Everything Is a Resource 21

        S3: An Important Exception 21

        IAM: Policies, Roles, and Users 22

        Policies 22

        Identity- Based Policies 24

        Resource- Based Policies 25

        Roles 25

        Users and User Groups 25

        Summarizing IAM 26

        Working with the Web Console 26

        The AWS Command- Line Interface 29

        Installing AWS cli 29

        Linux Installation 30

        macOS Installation 30

        Windows 31

        Configuring AWS cli 31

        A Note on Region 33

        Setting Individual Parameters 33

        Using Profiles and Configuration Files 33

        Final Notes on Configuration 36

        Using the AWS cli 36

        Using Skeletons and File Inputs 39

        Cleaning Up! 43

        Infrastructure- as- Code: CloudFormation and Terraform 44

        CloudFormation 44

        CloudFormation Stacks 46

        CloudFormation Template Anatomy 47

        CloudFormation Changesets 52

        Getting Stack Information 55

        Cleaning Up Again 57

        CloudFormation Conclusions 58

        Terraform 58

        Coding Style 58

        Modularity 59

        Limitations 59

        Terraform vs. CloudFormation 60

        Infrastructure- as- Code: CDK, Pulumi, Cloudcraft, and Other Solutions 60

        AWS CDK 60

        Pulumi 62

        Cloudcraft 62

        Infrastructure Management Conclusions 63

        Chapter 4 Serverless Computing and Data Engineering 65

        Serverless vs. Fully Managed 65

        AWS Serverless Technologies 66

        AWS Lambda 67

        Pricing Model 67

        Laser Focus on Code 68

        The Lambda Paradigm Shift 69

        Virtually Infinite Scalability 70

        Geographical Distribution 70

        A Lambda Hello World 71

        Lambda Configuration 74

        Runtime 74

        Container- Based Lambdas 75

        Architectures 75

        Memory 75

        Networking 76

        Execution Role 76

        Environment Variables 76

        AWS EventBridge 77

        AWS Fargate 77

        AWS DynamoDB 77

        AWS SNS 77

        Amazon SQS 78

        AWS CloudWatch 78

        Amazon QuickSight 78

        AWS Step Functions 78

        Amazon API Gateway 79

        Amazon Cognito 79

        AWS Serverless Application Model (SAM) 79

        Ephemeral Infrastructure 80

        AWS SAM Installation 80

        Configuration 80

        Creating Your First AWS SAM Project 81

        Application Structure 83

        SAM Resource Types 85

        SAM Lambda Template 86

        !! Recursive Lambda Invocation !! 88

        Function Metadata 88

        Outputs 89

        Implicitly Generated Resources 89

        Other Template Sections 90

        Lambda Code 90

        Building Your First SAM Application 93

        Testing the AWS SAM Application Locally 96

        Deployment 99

        Cleaning Up 104

        Summary 104

        Chapter 5 Data Ingestion 105

        AWS Data Lake Architecture 106

        Serverless Data Lake Architecture Structure 106

        Ingestion 106

        Storage and Processing 108

        Cataloging, Governance, and Search 108

        Security and Monitoring 109

        Consumption 109

        Sample Processing Architecture: Cataloging Images into DynamoDB 109

        Use Case Description 109

        SAM Application Creation 110

        S3- Triggered Lambda 111

        Adding DynamoDB 119

        Lambda Execution Context 121

        Inserting into DynamoDB 121

        Cleaning Up 123

        Serverless Ingestion 124

        AWS Fargate 124

        AWS Lambda 124

        Example Architecture: Fargate- Based Periodic Batch Import 125

        The Basic Importer 125

        ECS CLI 128

        AWS Copilot cli 128

        Clean Up 136

        AWS Kinesis Ingestion 136

        Example Architecture: Two- Pronged Delivery 137

        Fully Managed Ingestion with AppFlow 146

        Operational Data Ingestion with Database Migration Service 151

        DMS Concepts 151

        DMS Instance 151

        DMS Endpoints 152

        DMS Tasks 152

        Summary of the Workflow 152

        Common Use of DMS 153

        Example Architecture: DMS to S3 154

        DMS Instance 154

        DMS Endpoints 156

        DMS Task 162

        Summary 167

        Chapter 6 Processing Data 169

        Phases of Data Preparation 170

        What Is ETL? Why Should I Care? 170

        ETL Job vs. Streaming Job 171

        Overview of ETL in AWS 172

        ETL with AWS Glue 172

        ETL with Lambda Functions 172

        ETL with Hadoop/EMR 173

        Other Ways to Perform ETL 173

        ETL Job Design Concepts 173

        Source Identification 174

        Destination Identification 174

        Mappings 174

        Validation 174

        Filter 175

        Join, Denormalization, Relationalization 175

        AWS Glue for ETL 176

        Really, It’s Just Spark 176

        Visual 176

        Spark Script Editor 177

        Python Shell Script Editor 177

        Jupyter Notebook 177

        Connectors 177

        Creating Connections 178

        Creating Connections with the Web Console 178

        Creating Connections with the AWS cli 179

        Creating ETL Jobs with AWS Glue Visual Editor 184

        ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) 184

        Job Bookmarks 187

        Transformations 188

        Apply Mapping 189

        Filter 189

        Other Available Transforms 190

        Run the Edited Job 191

        Visual Editor with Source and Target Conclusions 192

        Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) 192

        Creating ETL Jobs with the Spark Script Editor 192

        Developing ETL Jobs with AWS Glue Notebooks 193

        What Is a Notebook? 194

        Notebook Structure 194

        Step 1: Load Code into a DynamicFrame 196

        Step 2: Apply Field Mapping 197

        Step 3: Apply the Filter 197

        Step 4: Write to S3 in Parquet Format 198

        Example: Joining and Denormalizing Data from Two S3 Locations 199

        Conclusions for Manually Authored Jobs with Notebooks 203

        Creating ETL Jobs with AWS Glue Interactive Sessions 204

        It’s Magic 205

        Development Workflow 206

        Streaming Jobs 207

        Differences with a Standard ETL Job 208

        Streaming Sources 208

        Example: Process Kinesis Streams with a Streaming Job 208

        Streaming ETL Jobs Conclusions 217

        Summary 217

        Chapter 7 Cataloging, Governance, and Search 219

        Cataloging with AWS Glue 219

        AWS Glue and the AWS Glue Data Catalog 219

        Glue Databases and Tables 220

        Databases 220

        The Idea of Schema- on- Read 221

        Tables 222

        Create Table Manually 223

        Creating a Table from an Existing Schema 225

        Creating a Table with a Crawler 225

        Summary on Databases and Tables 226

        Crawlers 226

        Updating or Not Updating? 230

        Running the Crawler 231

        Creating a Crawler from the AWS CLI 231

        Retrieving Table Information from the CLI 233

        Classifiers 235

        Classifier Example 236

        Crawlers and Classifiers Summary 237

        Search with Amazon Athena: The Heart of Analytics in AWS 238

        A Bit of History 238

        Interface Overview 238

        Creating Tables Manually 239

        Athena Data Types 240

        Complex Types 241

        Running a Query 242

        Connecting with JDBC and ODBC 243

        Query Stats 243

        Recent Queries and Saved Queries 243

        The Power of Partitions 244

        Athena Pricing Model 244

        Automatic Naming 245

        Athena Query Output 246

        Athena Peculiarities (SQL and Not) 246

        Computed Fields Gotcha and WITH Statement Workaround 246

        Lowercase! 247

        Query Explain 248

        Deduplicating Records 249

        Working with JSON, Flattening, and Unnesting 250

        Athena Views 251

        Create Table as Select (CTAS) 252

        Saving Queries and Reusing Saved Queries 253

        Running Parameterized Queries 254

        Athena Federated Queries 254

        Athena Lambda Connectors 255

        Note on Connection Errors 256

        Performing Federated Queries 257

        Creating a View from a Federated Query 258

        Governing: Athena Workgroups, Lake Formation, and More 258

        Athena Workgroups 259

        Fine- Grained Athena Access with IAM 262

        Recap of Athena- Based Governance 264

        AWS Lake Formation 265

        Registering a Location in Lake Formation 266

        Creating a Database in Lake Formation 268

        Assigning Permissions in Lake Formation 269

        LF- Tags and Permissions in Lake Formation 271

        Data Filters 277

        Governance Conclusions 279

        Summary 280

        Chapter 8 Data Consumption: BI, Visualization, and Reporting 283

        QuickSight 283

        Signing Up for QuickSight 284

        Standard Plan 284

        Enterprise Plan 284

        Users and User Groups 285

        Managing Users and Groups 285

        Managing QuickSight 286

        Users and Groups 287

        Your Subscriptions 287

        SPICE Capacity 287

        Account Settings 287

        Security and Permissions 287

        VPC Connections 288

        Mobile Settings 289

        Domains and Embedding 289

        Single Sign- On 289

        Data Sources and Datasets 289

        Creating an Athena Data Source 291

        Creating Other Data Sources 292

        Creating a Data Source from the AWS cli 292

        Creating a Dataset from a Table 294

        Creating a Dataset from a SQL Query 295

        Duplicating Datasets 296

        Note on Creating Datasets 297

        QuickSight Favorites, Recent, and Folders 297

        SPICE 298

        Manage SPICE Capacity 298

        Refresh Schedule 299

        QuickSight Data Editor 299

        QuickSight Data Types 302

        Change Data Types 302

        Calculated Fields 303

        Joining Data 305

        Excluding Fields 309

        Filtering Data 309

        Removing Data 310

        Geospatial Hierarchies and Adding Fields to Hierarchies 310

        Unsupported Format Dates 311

        Visualizing Data: QuickSight Analysis 312

        Adding a Title and a Description to Your Analysis 313

        Renaming the Sheet 314

        Your First Visual with AutoGraph 314

        Field Wells 314

        Visuals Types 315

        Saving and Autosaving 316

        A First Example: Pie Chart 316

        Renaming a Visual 317

        Filtering Data 318

        Adding Drill- Downs 320

        Parameters 321

        Actions 324

        Insights 328

        ML- Powered Insights 330

        Sharing an Analysis 335

        Dashboards 335

        Dashboard Layouts and Themes 335

        Publishing a Dashboard 336

        Embedding Visuals and Dashboards 337

        Data Consumption: Not Only Dashboards 337

        Summary 338

        Chapter 9 Machine Learning at Scale 339

        Machine Learning and Artificial Intelligence 339

        What Are ML/AI Use Cases? 340

        Types of ML Models 340

        Overview of ML/AI AWS Solutions 341

        Amazon SageMaker 341

        SageMaker Domains 342

        Adding a User to the Domain 344

        SageMaker Studio 344

        SageMaker Example Notebook 346

        Step 1: Prerequisites and Preprocessing 346

        Step 2: Data Ingestion 347

        Step 3: Data Inspection 348

        Step 4: Data Conversion 349

        Step 5: Upload Training Data 349

        Step 6: Train the Model 349

        Step 7: Set Up Hosting and Deploy the Model 351

        Step 8: Validate the Model 352

        Step 9: Use the Model 353

        Inference 353

        Real Time 354

        Asynchronous 354

        Serverless 354

        Batch Transform 354

        Data Wrangler 356

        SageMaker Canvas 357

        Summary 358

        Appendix Example Data Architectures in AWS 359

        Modern Data Lake Architecture 360

        ETL in a Lake House 361

        Consuming Data in the Lake House 361

        The Modern Data Lake Architecture 362

        Batch Processing 362

        Stream Processing 363

        Architecture Design Recommendations 364

        Automate Everything 365

        Build on Events 365

        Performance = Cost Savings 365

        AWS Glue Catalog and Athena- Centric Workflow 365

        Design Flexible 365

        Pick Your Battles 365

        Parquet 366

        Summary 366

        Index 367

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account