Description

Book Synopsis
Jump-start your career as a data scientistlearn to develop datasets for exploration, analysis, and machine learning SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that's dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls. You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data. This guide for data scientists differs from other instructional guides on the subject. It doesn't cover SQL broadly. Instead, you'll learn the subset of SQL skills that data analysts and data scientists use frequently. You'll also gain practical advice and direction on how to think about constructing your dataset. Gain an understanding of relational database structure, query design, and SQL syntaxDevelop queries to construct datasets for use in applications like interactive reports and machine learning algorithmsReview strategies and approaches so you can design analytical datasetsPractice your techniques with the provided database and SQL code In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner's perspective, moving your data scientist career forward!

Table of Contents

Introduction xix

Chapter 1 Data Sources 1

Data Sources 1

Tools for Connecting to Data Sources and Editing SQL 2

Relational Databases 3

Dimensional Data Warehouses 7

Asking Questions About the Data Source 9

Introduction to the Farmer’s Market Database 11

A Note on Machine Learning Dataset Terminology 12

Exercises 13

Chapter 2 The SELECT Statement 15

The SELECT Statement 15

The Fundamental Syntax Structure of a SELECT Query 16

Selecting Columns and Limiting the Number of Rows Returned 16

The ORDER BY Clause: Sorting Results 18

Introduction to Simple Inline Calculations 20

More Inline Calculation Examples: Rounding 22

More Inline Calculation Examples: Concatenating Strings 24

Evaluating Query Output 26

SELECT Statement Summary 29

Exercises Using the Included Database 30

Chapter 3 The WHERE Clause 31

The WHERE Clause 31

Filtering SELECT Statement Results 32

Filtering on Multiple Conditions 34

Multi-Column Conditional Filtering 40

More Ways to Filter 41

BETWEEN 41

IN 42

LIKE 43

IS NULL 44

A Warning About Null Comparisons 44

Filtering Using Subqueries 46

Exercises Using the Included Database 47

Chapter 4 CASE Statements 49

CASE Statement Syntax 50

Creating Binary Flags Using CASE 52

Grouping or Binning Continuous Values Using CASE 53

Categorical Encoding Using CASE 56

CASE Statement Summary 59

Exercises Using the Included Database 60

Chapter 5 SQL JOINs 61

Database Relationships and SQL JOINs 61

A Common Pitfall when Filtering Joined Data 71

JOINs with More than Two Tables 74

Exercises Using the Included Database 76

Chapter 6 Aggregating Results for Analysis 79

GROUP BY Syntax 79

Displaying Group Summaries 80

Performing Calculations Inside Aggregate Functions 84

MIN and MAX 88

COUNT and COUNT DISTINCT 90

Average 91

Filtering with HAVING 93

CASE Statements Inside Aggregate Functions 94

Exercises Using the Included Database 96

Chapter 7 Window Functions and Subqueries 97

ROW NUMBER 98

RANK and DENSE RANK 101

NTILE 102

Aggregate Window Functions 103

LAG and LEAD 108

Exercises Using the Included Database 111

Chapter 8 Date and Time Functions 113

Setting datetime Field Values 114

EXTRACT and DATE_PART 115

DATE_ADD and DATE_SUB 116

DATEDIFF 118

TIMESTAMPDIFF 119

Date Functions in Aggregate Summaries and Window Functions 119

Exercises 126

Chapter 9 Exploratory Data Analysis with SQL 127

Demonstrating Exploratory Data Analysis with SQL 128

Exploring the Products Table 128

Exploring Possible Column Values 131

Exploring Changes Over Time 134

Exploring Multiple Tables Simultaneously 135

Exploring Inventory vs. Sales 138

Exercises 142

Chapter 10 Building SQL Datasets for Analytical Reporting 143

Thinking Through Analytical Dataset Requirements 144

Using Custom Analytical Datasets in SQL:

CTEs and Views 149

Taking SQL Reporting Further 153

Exercises 157

Chapter 11 More Advanced Query Structures 159

UNIONs 159

Self-Join to Determine To-Date Maximum 163

Counting New vs. Returning Customers by Week 167

Summary 171

Exercises 171

Chapter 12 Creating Machine Learning Datasets Using SQL 173

Datasets for Time Series Models 174

Datasets for Binary Classification 176

Creating the Dataset 178

Expanding the Feature Set 181

Feature Engineering 185

Taking Things to the Next Level 189

Exercises 189

Chapter 13 Analytical Dataset Development Examples 191

What Factors Correlate with Fresh Produce Sales? 191

How Do Sales Vary by Customer Zip Code,

Market Distance, and Demographic Data? 211

How Does Product Price Distribution Affect

Market Sales? 217

Chapter 14 Storing and Modifying Data 229

Storing SQL Datasets as Tables and Views 229

Adding a Timestamp Column 232

Inserting Rows and Updating Values in Database Tables 233

Using SQL Inside Scripts 236

In Closing 237

Exercises 238

Appendix Answers to Exercises 239

Index 255

SQL for Data Scientists

    Product form

    £30.39

    Includes FREE delivery

    RRP £37.99 – you save £7.60 (20%)

    Order before 4pm today for delivery by Mon 8 Jun 2026.

    A Paperback / softback by Renee M. P. Teate

    3 in stock


      View other formats and editions of SQL for Data Scientists by Renee M. P. Teate

      Publisher: John Wiley & Sons Inc
      Publication Date: 15/11/2021
      ISBN13: 9781119669364, 978-1119669364
      ISBN10: 1119669367

      Description

      Book Synopsis
      Jump-start your career as a data scientistlearn to develop datasets for exploration, analysis, and machine learning SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that's dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls. You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data. This guide for data scientists differs from other instructional guides on the subject. It doesn't cover SQL broadly. Instead, you'll learn the subset of SQL skills that data analysts and data scientists use frequently. You'll also gain practical advice and direction on how to think about constructing your dataset. Gain an understanding of relational database structure, query design, and SQL syntaxDevelop queries to construct datasets for use in applications like interactive reports and machine learning algorithmsReview strategies and approaches so you can design analytical datasetsPractice your techniques with the provided database and SQL code In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner's perspective, moving your data scientist career forward!

      Table of Contents

      Introduction xix

      Chapter 1 Data Sources 1

      Data Sources 1

      Tools for Connecting to Data Sources and Editing SQL 2

      Relational Databases 3

      Dimensional Data Warehouses 7

      Asking Questions About the Data Source 9

      Introduction to the Farmer’s Market Database 11

      A Note on Machine Learning Dataset Terminology 12

      Exercises 13

      Chapter 2 The SELECT Statement 15

      The SELECT Statement 15

      The Fundamental Syntax Structure of a SELECT Query 16

      Selecting Columns and Limiting the Number of Rows Returned 16

      The ORDER BY Clause: Sorting Results 18

      Introduction to Simple Inline Calculations 20

      More Inline Calculation Examples: Rounding 22

      More Inline Calculation Examples: Concatenating Strings 24

      Evaluating Query Output 26

      SELECT Statement Summary 29

      Exercises Using the Included Database 30

      Chapter 3 The WHERE Clause 31

      The WHERE Clause 31

      Filtering SELECT Statement Results 32

      Filtering on Multiple Conditions 34

      Multi-Column Conditional Filtering 40

      More Ways to Filter 41

      BETWEEN 41

      IN 42

      LIKE 43

      IS NULL 44

      A Warning About Null Comparisons 44

      Filtering Using Subqueries 46

      Exercises Using the Included Database 47

      Chapter 4 CASE Statements 49

      CASE Statement Syntax 50

      Creating Binary Flags Using CASE 52

      Grouping or Binning Continuous Values Using CASE 53

      Categorical Encoding Using CASE 56

      CASE Statement Summary 59

      Exercises Using the Included Database 60

      Chapter 5 SQL JOINs 61

      Database Relationships and SQL JOINs 61

      A Common Pitfall when Filtering Joined Data 71

      JOINs with More than Two Tables 74

      Exercises Using the Included Database 76

      Chapter 6 Aggregating Results for Analysis 79

      GROUP BY Syntax 79

      Displaying Group Summaries 80

      Performing Calculations Inside Aggregate Functions 84

      MIN and MAX 88

      COUNT and COUNT DISTINCT 90

      Average 91

      Filtering with HAVING 93

      CASE Statements Inside Aggregate Functions 94

      Exercises Using the Included Database 96

      Chapter 7 Window Functions and Subqueries 97

      ROW NUMBER 98

      RANK and DENSE RANK 101

      NTILE 102

      Aggregate Window Functions 103

      LAG and LEAD 108

      Exercises Using the Included Database 111

      Chapter 8 Date and Time Functions 113

      Setting datetime Field Values 114

      EXTRACT and DATE_PART 115

      DATE_ADD and DATE_SUB 116

      DATEDIFF 118

      TIMESTAMPDIFF 119

      Date Functions in Aggregate Summaries and Window Functions 119

      Exercises 126

      Chapter 9 Exploratory Data Analysis with SQL 127

      Demonstrating Exploratory Data Analysis with SQL 128

      Exploring the Products Table 128

      Exploring Possible Column Values 131

      Exploring Changes Over Time 134

      Exploring Multiple Tables Simultaneously 135

      Exploring Inventory vs. Sales 138

      Exercises 142

      Chapter 10 Building SQL Datasets for Analytical Reporting 143

      Thinking Through Analytical Dataset Requirements 144

      Using Custom Analytical Datasets in SQL:

      CTEs and Views 149

      Taking SQL Reporting Further 153

      Exercises 157

      Chapter 11 More Advanced Query Structures 159

      UNIONs 159

      Self-Join to Determine To-Date Maximum 163

      Counting New vs. Returning Customers by Week 167

      Summary 171

      Exercises 171

      Chapter 12 Creating Machine Learning Datasets Using SQL 173

      Datasets for Time Series Models 174

      Datasets for Binary Classification 176

      Creating the Dataset 178

      Expanding the Feature Set 181

      Feature Engineering 185

      Taking Things to the Next Level 189

      Exercises 189

      Chapter 13 Analytical Dataset Development Examples 191

      What Factors Correlate with Fresh Produce Sales? 191

      How Do Sales Vary by Customer Zip Code,

      Market Distance, and Demographic Data? 211

      How Does Product Price Distribution Affect

      Market Sales? 217

      Chapter 14 Storing and Modifying Data 229

      Storing SQL Datasets as Tables and Views 229

      Adding a Timestamp Column 232

      Inserting Rows and Updating Values in Database Tables 233

      Using SQL Inside Scripts 236

      In Closing 237

      Exercises 238

      Appendix Answers to Exercises 239

      Index 255

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account