Description

Book Synopsis
Explore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of Black box algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk

Table of Contents

Introduction xix

Part I Motivation for Ethical Data Science and Background Knowledge 1

Chapter 1 Responsible Data Science 3

The Optum Disaster 4

Jekyll and Hyde 5

Eugenics 7

Galton, Pearson, and Fisher 7

Ties between Eugenics and Statistics 7

Ethical Problems in Data Science Today 9

Predictive Models 10

From Explaining to Predicting 10

Predictive Modeling 11

Setting the Stage for Ethical Issues to Arise 12

Classic Statistical Models 12

Black-Box Methods 14

Important Concepts in Predictive Modeling 19

Feature Selection 19

Model-Centric vs. Data-Centric Models 20

Holdout Sample and Cross-Validation 20

Overfitting 21

Unsupervised Learning 22

The Ethical Challenge of Black Boxes 23

Two Opposing Forces 24

Pressure for More Powerful AI 24

Public Resistance and Anxiety 24

Summary 25

Chapter 2 Background: Modeling and the Black-Box Algorithm 27

Assessing Model Performance 27

Predicting Class Membership 28

The Rare Class Problem 28

Lift and Gains 28

Area Under the Curve 29

AUC vs. Lift (Gains) 31

Predicting Numeric Values 32

Goodness-of-Fit 32

Holdout Sets and Cross-Validation 33

Optimization and Loss Functions 34

Intrinsically Interpretable Models vs. Black-Box Models 35

Ethical Challenges with Interpretable Models 38

Black-Box Models 39

Ensembles 39

Nearest Neighbors 41

Clustering 41

Association Rules 42

Collaborative Filters 42

Artificial Neural Nets and Deep Neural Nets 43

Problems with Black-Box Predictive Models 45

Problems with Unsupervised Algorithms 47

Summary 48

Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49

AI and Intentional Consequences by Design 50

Deepfakes 50

Supporting State Surveillance and Suppression 51

Behavioral Manipulation 52

Automated Testing to Fine-Tune Targeting 53

AI and Unintended Consequences 55

Healthcare 56

Finance 57

Law Enforcement 58

Technology 60

The Legal and Regulatory Landscape around AI 61

Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63

A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64

Trends in Emerging Law and Policy Related to AI 66

Summary 69

Part II The Ethical Data Science Process 71

Chapter 4 The Responsible Data Science Framework 73

Why We Keep Building Harmful AI 74

Misguided Need for Cutting-Edge Models 74

Excessive Focus on Predictive Performance 74

Ease of Access and the Curse of Simplicity 76

The Common Cause 76

The Face Thieves 78

An Anatomy of Modeling Harms 79

The World: Context Matters for Modeling 80

The Data: Representation Is Everything 83

The Model: Garbage In, Danger Out 85

Model Interpretability: Human Understanding for Superhuman Models 86

Efforts Toward a More Responsible Data Science 89

Principles Are the Focus 90

Nonmaleficence 90

Fairness 90

Transparency 91

Accountability 91

Privacy 92

Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92

Justification 94

Compilation 94

Preparation 95

Modeling 96

Auditing 96

Summary 97

Chapter 5 Model Interpretability: The What and the Why 99

The Sexist Résumé Screener 99

The Necessity of Model Interpretability 101

Connections Between Predictive Performance and Interpretability 103

Uniting (High) Model Performance and Model Interpretability 105

Categories of Interpretability Methods 107

Global Methods 107

Local Methods 113

Real-World Successes of Interpretability Methods 113

Facilitating Debugging and Audit 114

Leveraging the Improved Performance of Black-Box Models 116

Acquiring New Knowledge 116

Addressing Critiques of Interpretability Methods 117

Explanations Generated by Interpretability Methods Are Not Robust 118

Explanations Generated by Interpretability Methods Are Low Fidelity 120

The Forking Paths of Model Interpretability 121

The Four-Measure Baseline 122

Building Our Own Credit Scoring Model 124

Using Train-Test Splits 125

Feature Selection and Feature Engineering 125

Baseline Models 127

The Importance of Making Your Code Work for Everyone 129

Execution Variability 129

Addressing Execution Variability with Functionalized Code 130

Stochastic Variability 130

Addressing Stochastic Variability via Resampling 130

Summary 133

Part III EDS in Practice 135

Chapter 6 Beginning a Responsible Data Science Project 137

How the Responsible Data Science Framework Addresses the Common Cause 138

Datasets Used 140

Regression Datasets—Communities and Crime 140

Classification Datasets—COMPAS 140

Common Elements Across Our Analyses 141

Project Structure and Documentation 141

Project Structure for the Responsible Data

Science Framework: Everything in Its Place 142

Documentation: The Responsible Thing to Do 145

Beginning a Responsible Data Science Project 151

Communities and Crime (Regression) 151

Justification 151

Compilation 154

Identifying Protected Classes 157

Preparation—Data Splitting and Feature Engineering 159

Datasheets 161

COMPAS (Classification) 164

Justification 164

Compilation 166

Identifying Protected Classes 168

Preparation 169

Summary 172

Chapter 7 Auditing a Responsible Data Science Project 173

Fairness and Data Science in Practice 175

The Many Different Conceptions of Fairness 175

Different Forms of Fairness Are Trade-Offs with Each Other 177

Quantifying Predictive Fairness Within a Data Science Project 179

Mitigating Bias to Improve Fairness 185

Preprocessing 185

In-processing 186

Postprocessing 186

Classification Example: COMPAS 187

Prework: Code Practices, Modeling, and Auditing 187

Justification, Compilation, and Preparation Review 189

Modeling 191

Auditing 200

Per-Group Metrics: Overall 200

Per-Group Metrics: Error 202

Fairness Metrics 204

Interpreting Our Models: Why Are They Unfair? 207

Analysis for Different Groups 209

Bias Mitigation 214

Preprocessing: Oversampling 214

Postprocessing: Optimizing Thresholds

Automatically 218

Postprocessing: Optimizing Thresholds Manually 219

Summary 223

Chapter 8 Auditing for Neural Networks 225

Why Neural Networks Merit Their Own Chapter 227

Neural Networks Vary Greatly in Structure 227

Neural Networks Treat Features Differently 229

Neural Networks Repeat Themselves 231

A More Impenetrable Black Box 232

Baseline Methods 233

Representation Methods 233

Distillation Methods 234

Intrinsic Methods 235

Beginning a Responsible Neural Network Project 236

Justification 236

Moving Forward 239

Compilation 239

Tracking Experiments 241

Preparation 244

Modeling 245

Auditing 247

Per-Group Metrics: Overall 247

Per-Group Metrics: Unusual Definitions of “False Positive” 248

Fairness Metrics 249

Interpreting Our Models: Why Are They Unfair? 252

Bias Mitigation 253

Wrap-Up 255

Auditing Neural Networks for Natural Language Processing 258

Identifying and Addressing Sources of Bias in NLP 258

The Real World 259

Data 260

Models 261

Model Interpretability 262

Summary 262

Chapter 9 Conclusion 265

How Can We Do Better? 267

The Responsible Data Science Framework 267

Doing Better As Managers 269

Doing Better As Practitioners 270

A Better Future If We Can Keep It 271

Index 273

Responsible Data Science

    Product form

    £24.79

    Includes FREE delivery

    RRP £30.99 – you save £6.20 (20%)

    Order before 4pm tomorrow for delivery by Sat 4 Jul 2026.

    A Paperback / softback by Grant Fleming, Peter C. Bruce

    2 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Responsible Data Science by Grant Fleming

      Publisher: John Wiley & Sons Inc
      Publication Date: 24/06/2021
      ISBN13: 9781119741756, 978-1119741756
      ISBN10: 1119741750
      Also in:
      Data mining

      Description

      Book Synopsis
      Explore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of Black box algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk

      Table of Contents

      Introduction xix

      Part I Motivation for Ethical Data Science and Background Knowledge 1

      Chapter 1 Responsible Data Science 3

      The Optum Disaster 4

      Jekyll and Hyde 5

      Eugenics 7

      Galton, Pearson, and Fisher 7

      Ties between Eugenics and Statistics 7

      Ethical Problems in Data Science Today 9

      Predictive Models 10

      From Explaining to Predicting 10

      Predictive Modeling 11

      Setting the Stage for Ethical Issues to Arise 12

      Classic Statistical Models 12

      Black-Box Methods 14

      Important Concepts in Predictive Modeling 19

      Feature Selection 19

      Model-Centric vs. Data-Centric Models 20

      Holdout Sample and Cross-Validation 20

      Overfitting 21

      Unsupervised Learning 22

      The Ethical Challenge of Black Boxes 23

      Two Opposing Forces 24

      Pressure for More Powerful AI 24

      Public Resistance and Anxiety 24

      Summary 25

      Chapter 2 Background: Modeling and the Black-Box Algorithm 27

      Assessing Model Performance 27

      Predicting Class Membership 28

      The Rare Class Problem 28

      Lift and Gains 28

      Area Under the Curve 29

      AUC vs. Lift (Gains) 31

      Predicting Numeric Values 32

      Goodness-of-Fit 32

      Holdout Sets and Cross-Validation 33

      Optimization and Loss Functions 34

      Intrinsically Interpretable Models vs. Black-Box Models 35

      Ethical Challenges with Interpretable Models 38

      Black-Box Models 39

      Ensembles 39

      Nearest Neighbors 41

      Clustering 41

      Association Rules 42

      Collaborative Filters 42

      Artificial Neural Nets and Deep Neural Nets 43

      Problems with Black-Box Predictive Models 45

      Problems with Unsupervised Algorithms 47

      Summary 48

      Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49

      AI and Intentional Consequences by Design 50

      Deepfakes 50

      Supporting State Surveillance and Suppression 51

      Behavioral Manipulation 52

      Automated Testing to Fine-Tune Targeting 53

      AI and Unintended Consequences 55

      Healthcare 56

      Finance 57

      Law Enforcement 58

      Technology 60

      The Legal and Regulatory Landscape around AI 61

      Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63

      A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64

      Trends in Emerging Law and Policy Related to AI 66

      Summary 69

      Part II The Ethical Data Science Process 71

      Chapter 4 The Responsible Data Science Framework 73

      Why We Keep Building Harmful AI 74

      Misguided Need for Cutting-Edge Models 74

      Excessive Focus on Predictive Performance 74

      Ease of Access and the Curse of Simplicity 76

      The Common Cause 76

      The Face Thieves 78

      An Anatomy of Modeling Harms 79

      The World: Context Matters for Modeling 80

      The Data: Representation Is Everything 83

      The Model: Garbage In, Danger Out 85

      Model Interpretability: Human Understanding for Superhuman Models 86

      Efforts Toward a More Responsible Data Science 89

      Principles Are the Focus 90

      Nonmaleficence 90

      Fairness 90

      Transparency 91

      Accountability 91

      Privacy 92

      Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92

      Justification 94

      Compilation 94

      Preparation 95

      Modeling 96

      Auditing 96

      Summary 97

      Chapter 5 Model Interpretability: The What and the Why 99

      The Sexist Résumé Screener 99

      The Necessity of Model Interpretability 101

      Connections Between Predictive Performance and Interpretability 103

      Uniting (High) Model Performance and Model Interpretability 105

      Categories of Interpretability Methods 107

      Global Methods 107

      Local Methods 113

      Real-World Successes of Interpretability Methods 113

      Facilitating Debugging and Audit 114

      Leveraging the Improved Performance of Black-Box Models 116

      Acquiring New Knowledge 116

      Addressing Critiques of Interpretability Methods 117

      Explanations Generated by Interpretability Methods Are Not Robust 118

      Explanations Generated by Interpretability Methods Are Low Fidelity 120

      The Forking Paths of Model Interpretability 121

      The Four-Measure Baseline 122

      Building Our Own Credit Scoring Model 124

      Using Train-Test Splits 125

      Feature Selection and Feature Engineering 125

      Baseline Models 127

      The Importance of Making Your Code Work for Everyone 129

      Execution Variability 129

      Addressing Execution Variability with Functionalized Code 130

      Stochastic Variability 130

      Addressing Stochastic Variability via Resampling 130

      Summary 133

      Part III EDS in Practice 135

      Chapter 6 Beginning a Responsible Data Science Project 137

      How the Responsible Data Science Framework Addresses the Common Cause 138

      Datasets Used 140

      Regression Datasets—Communities and Crime 140

      Classification Datasets—COMPAS 140

      Common Elements Across Our Analyses 141

      Project Structure and Documentation 141

      Project Structure for the Responsible Data

      Science Framework: Everything in Its Place 142

      Documentation: The Responsible Thing to Do 145

      Beginning a Responsible Data Science Project 151

      Communities and Crime (Regression) 151

      Justification 151

      Compilation 154

      Identifying Protected Classes 157

      Preparation—Data Splitting and Feature Engineering 159

      Datasheets 161

      COMPAS (Classification) 164

      Justification 164

      Compilation 166

      Identifying Protected Classes 168

      Preparation 169

      Summary 172

      Chapter 7 Auditing a Responsible Data Science Project 173

      Fairness and Data Science in Practice 175

      The Many Different Conceptions of Fairness 175

      Different Forms of Fairness Are Trade-Offs with Each Other 177

      Quantifying Predictive Fairness Within a Data Science Project 179

      Mitigating Bias to Improve Fairness 185

      Preprocessing 185

      In-processing 186

      Postprocessing 186

      Classification Example: COMPAS 187

      Prework: Code Practices, Modeling, and Auditing 187

      Justification, Compilation, and Preparation Review 189

      Modeling 191

      Auditing 200

      Per-Group Metrics: Overall 200

      Per-Group Metrics: Error 202

      Fairness Metrics 204

      Interpreting Our Models: Why Are They Unfair? 207

      Analysis for Different Groups 209

      Bias Mitigation 214

      Preprocessing: Oversampling 214

      Postprocessing: Optimizing Thresholds

      Automatically 218

      Postprocessing: Optimizing Thresholds Manually 219

      Summary 223

      Chapter 8 Auditing for Neural Networks 225

      Why Neural Networks Merit Their Own Chapter 227

      Neural Networks Vary Greatly in Structure 227

      Neural Networks Treat Features Differently 229

      Neural Networks Repeat Themselves 231

      A More Impenetrable Black Box 232

      Baseline Methods 233

      Representation Methods 233

      Distillation Methods 234

      Intrinsic Methods 235

      Beginning a Responsible Neural Network Project 236

      Justification 236

      Moving Forward 239

      Compilation 239

      Tracking Experiments 241

      Preparation 244

      Modeling 245

      Auditing 247

      Per-Group Metrics: Overall 247

      Per-Group Metrics: Unusual Definitions of “False Positive” 248

      Fairness Metrics 249

      Interpreting Our Models: Why Are They Unfair? 252

      Bias Mitigation 253

      Wrap-Up 255

      Auditing Neural Networks for Natural Language Processing 258

      Identifying and Addressing Sources of Bias in NLP 258

      The Real World 259

      Data 260

      Models 261

      Model Interpretability 262

      Summary 262

      Chapter 9 Conclusion 265

      How Can We Do Better? 267

      The Responsible Data Science Framework 267

      Doing Better As Managers 269

      Doing Better As Practitioners 270

      A Better Future If We Can Keep It 271

      Index 273

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account