Description

Book Synopsis


Table of Contents

Acknowledgments xiii

Foreword xxiii

Introduction xxvii

Part One Thinking Like a Data Head

Chapter 1 What Is the Problem? 3

Questions a Data Head Should Ask 4

Why Is This Problem Important? 4

Who Does This Problem Affect? 6

What If We Don’t Have the Right Data? 6

When Is the Project Over? 7

What If We Don’t Like the Results? 7

Understanding Why Data Projects Fail 8

Customer Perception 8

Discussion 10

Working on Problems That Matter 11

Chapter Summary 11

Chapter 2 What Is Data? 13

Data vs. Information 13

An Example Dataset 14

Data Types 15

How Data Is Collected and Structured 16

Observational vs. Experimental Data 16

Structured vs. Unstructured Data 17

Basic Summary Statistics 18

Chapter Summary 19

Chapter 3 Prepare to Think Statistically 21

Ask Questions 22

There Is Variation in All Things 23

Scenario: Customer Perception (The Sequel) 24

Case Study: Kidney-Cancer Rates 26

Probabilities and Statistics 28

Probability vs. Intuition 29

Discovery with Statistics 31

Chapter Summary 33

Part Two Speaking Like a Data Head

Chapter 4 Argue with the Data 37

What Would You Do? 38

Missing Data Disaster 39

Tell Me the Data Origin Story 43

Who Collected the Data? 44

How Was the Data Collected? 44

Is the Data Representative? 45

Is There Sampling Bias? 46

What Did You Do with Outliers? 46

What Data Am I Not Seeing? 47

How Did You Deal with Missing Values? 47

Can the Data Measure What You Want It to Measure? 48

Argue with Data of All Sizes 48

Chapter Summary 49

Chapter 5 Explore the Data 51

Exploratory Data Analysis and You 52

Embracing the Exploratory Mindset 52

Questions to Guide You 53

The Setup 53

Can the Data Answer the Question? 54

Set Expectations and Use Common Sense 54

Do the Values Make Intuitive Sense? 54

Watch Out: Outliers and Missing Values 58

Did You Discover Any Relationships? 59

Understanding Correlation 59

Watch Out: Misinterpreting Correlation 60

Watch Out: Correlation Does Not Imply Causation 62

Did You Find New Opportunities in the Data? 63

Chapter Summary 63

Chapter 6 Examine the Probabilities 65

Take a Guess 66

The Rules of the Game 66

Notation 67

Conditional Probability and Independent Events 69

The Probability of Multiple Events 69

Two Things That Happen Together 69

One Thing or the Other 70

Probability Thought Exercise 72

Next Steps 73

Be Careful Assuming Independence 74

Don’t Fall for the Gambler’s Fallacy 74

All Probabilities Are Conditional 75

Don’t Swap Dependencies 76

Bayes’ Theorem 76

Ensure the Probabilities Have Meaning 79

Calibration 80

Rare Events Can, and Do, Happen 80

Chapter Summary 81

Chapter 7 Challenge the Statistics 83

Quick Lessons on Inference 83

Give Yourself Some Wiggle Room 84

More Data, More Evidence 84

Challenge the Status Quo 85

Evidence to the Contrary 86

Balance Decision Errors 88

The Process of Statistical Inference 89

The Questions You Should Ask to Challenge the Statistics 90

What Is the Context for These Statistics? 90

What Is the Sample Size? 91

What Are You Testing? 92

What Is the Null Hypothesis? 92

Assuming Equivalence 93

What Is the Significance Level? 93

How Many Tests Are You Doing? 94

Can I See the Confidence Intervals? 95

Is This Practically Significant? 96

Are You Assuming Causality? 96

Chapter Summary 97

Part Three Understanding the Data Scientist’s Toolbox

Chapter 8 Search for Hidden Groups 101

Unsupervised Learning 102

Dimensionality Reduction 102

Creating Composite Features 103

Principal Component Analysis 105

Principal Components in Athletic Ability 105

PCA Summary 108

Potential Traps 109

Clustering 110

k-Means Clustering 111

Clustering Retail Locations 111

Potential Traps 113

Chapter Summary 114

Chapter 9 Understand the Regression Model 117

Supervised Learning 117

Linear Regression: What It Does 119

Least Squares Regression: Not Just a Clever Name 120

Linear Regression: What It Gives You 123

Extending to Many Features 124

Linear Regression: What Confusion It Causes 125

Omitted Variables 125

Multicollinearity 126

Data Leakage 127

Extrapolation Failures 128

Many Relationships Aren’t Linear 128

Are You Explaining or Predicting? 128

Regression Performance 130

Other Regression Models 131

Chapter Summary 131

Chapter 10 Understand the Classification Model 133

Introduction to Classification 133

What You’ll Learn 134

Classification Problem Setup 135

Logistic Regression 135

Logistic Regression: So What? 138

Decision Trees 139

Ensemble Methods 142

Random Forests 143

Gradient Boosted Trees 143

Interpretability of Ensemble Models 145

Watch Out for Pitfalls 145

Misapplication of the Problem 146

Data Leakage 146

Not Splitting Your Data 146

Choosing the Right Decision Threshold 147

Misunderstanding Accuracy 147

Confusion Matrices 148

Chapter Summary 150

Chapter 11 Understand Text Analytics 151

Expectations of Text Analytics 151

How Text Becomes Numbers 153

A Big Bag of Words 153

N-Grams 157

Word Embeddings 158

Topic Modeling 160

Text Classification 163

Naïve Bayes 164

Sentiment Analysis 166

Practical Considerations When Working with Text 167

Big Tech Has the Upper Hand 168

Chapter Summary 169

Chapter 12 Conceptualize Deep Learning 171

Neural Networks 172

How Are Neural Networks Like the Brain? 172

A Simple Neural Network 173

How a Neural Network Learns 174

A Slightly More Complex Neural Network 175

Applications of Deep Learning 178

The Benefits of Deep Learning 179

How Computers “See” Images 180

Convolutional Neural Networks 182

Deep Learning on Language and Sequences 183

Deep Learning in Practice 185

Do You Have Data? 185

Is Your Data Structured? 186

What Will the Network Look Like? 186

Artificial Intelligence and You 187

Big Tech Has the Upper Hand 188

Ethics in Deep Learning 189

Chapter Summary 190

Part Four Ensuring Success

Chapter 13 Watch Out for Pitfalls 193

Biases and Weird Phenomena in Data 194

Survivorship Bias 194

Regression to the Mean 195

Simpson’s Paradox 195

Confirmation Bias 197

Effort Bias (aka the “Sunk Cost Fallacy”) 197

Algorithmic Bias 198

Uncategorized Bias 198

The Big List of Pitfalls 199

Statistical and Machine Learning Pitfalls 199

Project Pitfalls 200

Chapter Summary 202

Chapter 14 Know the People and Personalities 203

Seven Scenes of Communication Breakdowns 204

The Postmortem 204

Storytime 205

The Telephone Game 206

Into the Weeds 206

The Reality Check 207

The Takeover 207

The Blowhard 208

Data Personalities 208

Data Enthusiasts 209

Data Cynics 209

Data Heads 209

Chapter Summary 210

Chapter 15 What’s Next? 211

Index 215

Becoming a Data Head

    Product form

    £26.40

    Includes FREE delivery

    RRP £33.00 – you save £6.60 (20%)

    Order before 4pm today for delivery by Mon 22 Jun 2026.

    A Paperback / softback by Alex J. Gutman, Jordan Goldmeier

    2 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Becoming a Data Head by Alex J. Gutman

      Publisher: John Wiley & Sons Inc
      Publication Date: 24/06/2021
      ISBN13: 9781119741749, 978-1119741749
      ISBN10: 1119741742

      Description

      Book Synopsis


      Table of Contents

      Acknowledgments xiii

      Foreword xxiii

      Introduction xxvii

      Part One Thinking Like a Data Head

      Chapter 1 What Is the Problem? 3

      Questions a Data Head Should Ask 4

      Why Is This Problem Important? 4

      Who Does This Problem Affect? 6

      What If We Don’t Have the Right Data? 6

      When Is the Project Over? 7

      What If We Don’t Like the Results? 7

      Understanding Why Data Projects Fail 8

      Customer Perception 8

      Discussion 10

      Working on Problems That Matter 11

      Chapter Summary 11

      Chapter 2 What Is Data? 13

      Data vs. Information 13

      An Example Dataset 14

      Data Types 15

      How Data Is Collected and Structured 16

      Observational vs. Experimental Data 16

      Structured vs. Unstructured Data 17

      Basic Summary Statistics 18

      Chapter Summary 19

      Chapter 3 Prepare to Think Statistically 21

      Ask Questions 22

      There Is Variation in All Things 23

      Scenario: Customer Perception (The Sequel) 24

      Case Study: Kidney-Cancer Rates 26

      Probabilities and Statistics 28

      Probability vs. Intuition 29

      Discovery with Statistics 31

      Chapter Summary 33

      Part Two Speaking Like a Data Head

      Chapter 4 Argue with the Data 37

      What Would You Do? 38

      Missing Data Disaster 39

      Tell Me the Data Origin Story 43

      Who Collected the Data? 44

      How Was the Data Collected? 44

      Is the Data Representative? 45

      Is There Sampling Bias? 46

      What Did You Do with Outliers? 46

      What Data Am I Not Seeing? 47

      How Did You Deal with Missing Values? 47

      Can the Data Measure What You Want It to Measure? 48

      Argue with Data of All Sizes 48

      Chapter Summary 49

      Chapter 5 Explore the Data 51

      Exploratory Data Analysis and You 52

      Embracing the Exploratory Mindset 52

      Questions to Guide You 53

      The Setup 53

      Can the Data Answer the Question? 54

      Set Expectations and Use Common Sense 54

      Do the Values Make Intuitive Sense? 54

      Watch Out: Outliers and Missing Values 58

      Did You Discover Any Relationships? 59

      Understanding Correlation 59

      Watch Out: Misinterpreting Correlation 60

      Watch Out: Correlation Does Not Imply Causation 62

      Did You Find New Opportunities in the Data? 63

      Chapter Summary 63

      Chapter 6 Examine the Probabilities 65

      Take a Guess 66

      The Rules of the Game 66

      Notation 67

      Conditional Probability and Independent Events 69

      The Probability of Multiple Events 69

      Two Things That Happen Together 69

      One Thing or the Other 70

      Probability Thought Exercise 72

      Next Steps 73

      Be Careful Assuming Independence 74

      Don’t Fall for the Gambler’s Fallacy 74

      All Probabilities Are Conditional 75

      Don’t Swap Dependencies 76

      Bayes’ Theorem 76

      Ensure the Probabilities Have Meaning 79

      Calibration 80

      Rare Events Can, and Do, Happen 80

      Chapter Summary 81

      Chapter 7 Challenge the Statistics 83

      Quick Lessons on Inference 83

      Give Yourself Some Wiggle Room 84

      More Data, More Evidence 84

      Challenge the Status Quo 85

      Evidence to the Contrary 86

      Balance Decision Errors 88

      The Process of Statistical Inference 89

      The Questions You Should Ask to Challenge the Statistics 90

      What Is the Context for These Statistics? 90

      What Is the Sample Size? 91

      What Are You Testing? 92

      What Is the Null Hypothesis? 92

      Assuming Equivalence 93

      What Is the Significance Level? 93

      How Many Tests Are You Doing? 94

      Can I See the Confidence Intervals? 95

      Is This Practically Significant? 96

      Are You Assuming Causality? 96

      Chapter Summary 97

      Part Three Understanding the Data Scientist’s Toolbox

      Chapter 8 Search for Hidden Groups 101

      Unsupervised Learning 102

      Dimensionality Reduction 102

      Creating Composite Features 103

      Principal Component Analysis 105

      Principal Components in Athletic Ability 105

      PCA Summary 108

      Potential Traps 109

      Clustering 110

      k-Means Clustering 111

      Clustering Retail Locations 111

      Potential Traps 113

      Chapter Summary 114

      Chapter 9 Understand the Regression Model 117

      Supervised Learning 117

      Linear Regression: What It Does 119

      Least Squares Regression: Not Just a Clever Name 120

      Linear Regression: What It Gives You 123

      Extending to Many Features 124

      Linear Regression: What Confusion It Causes 125

      Omitted Variables 125

      Multicollinearity 126

      Data Leakage 127

      Extrapolation Failures 128

      Many Relationships Aren’t Linear 128

      Are You Explaining or Predicting? 128

      Regression Performance 130

      Other Regression Models 131

      Chapter Summary 131

      Chapter 10 Understand the Classification Model 133

      Introduction to Classification 133

      What You’ll Learn 134

      Classification Problem Setup 135

      Logistic Regression 135

      Logistic Regression: So What? 138

      Decision Trees 139

      Ensemble Methods 142

      Random Forests 143

      Gradient Boosted Trees 143

      Interpretability of Ensemble Models 145

      Watch Out for Pitfalls 145

      Misapplication of the Problem 146

      Data Leakage 146

      Not Splitting Your Data 146

      Choosing the Right Decision Threshold 147

      Misunderstanding Accuracy 147

      Confusion Matrices 148

      Chapter Summary 150

      Chapter 11 Understand Text Analytics 151

      Expectations of Text Analytics 151

      How Text Becomes Numbers 153

      A Big Bag of Words 153

      N-Grams 157

      Word Embeddings 158

      Topic Modeling 160

      Text Classification 163

      Naïve Bayes 164

      Sentiment Analysis 166

      Practical Considerations When Working with Text 167

      Big Tech Has the Upper Hand 168

      Chapter Summary 169

      Chapter 12 Conceptualize Deep Learning 171

      Neural Networks 172

      How Are Neural Networks Like the Brain? 172

      A Simple Neural Network 173

      How a Neural Network Learns 174

      A Slightly More Complex Neural Network 175

      Applications of Deep Learning 178

      The Benefits of Deep Learning 179

      How Computers “See” Images 180

      Convolutional Neural Networks 182

      Deep Learning on Language and Sequences 183

      Deep Learning in Practice 185

      Do You Have Data? 185

      Is Your Data Structured? 186

      What Will the Network Look Like? 186

      Artificial Intelligence and You 187

      Big Tech Has the Upper Hand 188

      Ethics in Deep Learning 189

      Chapter Summary 190

      Part Four Ensuring Success

      Chapter 13 Watch Out for Pitfalls 193

      Biases and Weird Phenomena in Data 194

      Survivorship Bias 194

      Regression to the Mean 195

      Simpson’s Paradox 195

      Confirmation Bias 197

      Effort Bias (aka the “Sunk Cost Fallacy”) 197

      Algorithmic Bias 198

      Uncategorized Bias 198

      The Big List of Pitfalls 199

      Statistical and Machine Learning Pitfalls 199

      Project Pitfalls 200

      Chapter Summary 202

      Chapter 14 Know the People and Personalities 203

      Seven Scenes of Communication Breakdowns 204

      The Postmortem 204

      Storytime 205

      The Telephone Game 206

      Into the Weeds 206

      The Reality Check 207

      The Takeover 207

      The Blowhard 208

      Data Personalities 208

      Data Enthusiasts 209

      Data Cynics 209

      Data Heads 209

      Chapter Summary 210

      Chapter 15 What’s Next? 211

      Index 215

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account