Description

Book Synopsis


Table of Contents

Introduction xix

1 Everything You Ever Needed to Know About Spreadsheets but Were Too Afraid to Ask 1

Some Sample Data 2

Accessing Quick Descriptive Statistics 3

Excel Tables 4

Filtering and Sorting 5

Table Formatting 7

Structured References 7

Adding Table Columns 10

Lookup Formulas 11

VLOOKUP 11

INDEX/MATCH 13

XLOOKUP 15

PivotTables 16

Using Array Formulas 19

Solving Stuff with Solver 20

2 Set It and Forget It: An Introduction to Power Query 27

What Is Power Query? 27

Sample Data 28

Starting Power Query 29

Filtering Rows 32

Removing Columns 33

Find & Replace 34

Close & Load to Table 35

3 Naïve Bayes and the Incredible Lightness of Being an Idiot 39

The World's Fastest Intro to Probability Theory 39

Totaling Conditional Probabilities 40

Joint Probability, the Chain Rule, and Independence 40

What Happens in a Dependent Situation? 41

Bayes Rule 42

Separating the Signal and the Noise 43

Using the Bayes Rule to Create an AI Model 44

High-Level Class Probabilities Are Often Assumed to Be Equal 45

A Couple More Odds and Ends 46

Let's Get This Excel Party Started 47

Cleaning the Data with Power Query 48

Splitting on Spaces: Giving Each Word Its Due 50

Counting Tokens and Calculating Probabilities 55

We Have a Model! Let's Use It 58

4 Cluster Analysis Part 1: Using K-Means to Segment Your Customer Base 65

Dances at Summer Camp 65

Getting Real: K-Means Clustering Subscribers in Email Marketing 70

The Initial Dataset 71

Determining What to Measure 72

Start with Four Clusters 75

Euclidean Distance: Measuring Distances as the Crow Flies 76

Solving for the Cluster Centers 80

Making Sense of the Results 82

Getting the Top Deals by Cluster 83

The Silhouette: A Good Way to Let Different K Values Duke It Out 86

How About Five Clusters? 95

Solving for Five Clusters 96

Getting the Top Deals for All Five Clusters 96

Computing the Silhouette for 5-Means Clustering 99

K-Medians Clustering and Asymmetric Distance Measurements 100

Using K-Medians Clustering 100

Getting a More Appropriate Distance Metric 100

Putting It All in Excel 102

The Top Deals for the 5-Medians Clusters 104

5 Cluster Analysis Part II: Network Graphs and Community Detection 109

What Is a Network Graph? 110

Visualizing a Simple Graph 110

Beyond GiGraph and Adjacency Lists 115

Building a Graph from the Wholesale Wine Data 117

Creating a Cosine Similarity Matrix 118

Producing an R-Neighborhood Graph 121

Introduction to Gephi 123

Creating a Static Adjacency Matrix 124

Bringing in Your R-Neighborhood Adjacency Matrix into Gephi 124

Node Degree 128

Touching the Graph Data 130

How Much Is an Edge Worth? Points and Penalties in Graph Modularity 132

What's a Point, and What's a Penalty? 133

Setting Up the Score Sheet 136

Let's Get Clustering! 138

Split Number 1 138

Split 2: Electric Boogaloo 143

And. . .Split3: Split with a Vengeance 145

Encoding and Analyzing the Communities 146

There and Back Again: A Gephi Tale 151

6 Regression: The Granddaddy of Supervised Artificial Intelligence 157

Predicting Pregnant Customers at RetailMart Using Linear Regression 158

The Feature Set 159

Assembling the Training Data 161

Creating Dummy Variables 163

Let's Bake Our Own Linear Regression 165

Linear Regression Statistics: R-Squared, F-Tests, t-Tests 173

Making Predictions on Some New Data and Measuring Performance 182

Predicting Pregnant Customers at RetailMart Using Logistic Regression 192

First You Need a Link Function 192

Hooking Up the Logistic Function and Reoptimizing 193

Baking an Actual Logistic Regression 196

7 Ensemble Models: A Whole Lot of Bad Pizza 203

Getting Started Using the Data from Chapter 6 203

Bagging: Randomize, Train, Repeat 204

Decision Stump is Another Name for a Weak Learner 204

Doesn't Seem So Weak to Me! 204

You Need More Power! 207

Let's Train It 208

Evaluating the Bagged Model 220

Boosting: If You Get It Wrong, Just Boost and Try Again 223

Training the Model—Every Feature Gets a Shot 224

Evaluating the Boosted Model 231

8 Forecasting: Breathe Easy: You Can't Win 235

The Sword Trade Is Hopping 236

Getting Acquainted with Time-Series Data 236

Starting Slow with Simple Exponential Smoothing 238

Setting Up the Simple Exponential Smoothing Forecast 240

You Might Have a Trend 249

Holt's Trend-Corrected Exponential Smoothing 250

Setting Up Holt's Trend-Corrected Smoothing in a Spreadsheet 252

So Are You Done? Looking at Autocorrelations 258

Multiplicative Holt-Winters Exponential Smoothing 266

Setting the Initial Values for Level, Trend, and Seasonality 268

Getting Rolling on the Forecast 274

And. . .Optimize! 280

Putting a Prediction Interval Around the Forecast 283

Creating a Fan Chart for Effect 287

Forecast Sheets in Excel 289

9 Optimization Modeling: Because That "Fresh-Squeezed" Orange Juice Ain't Gonna Blend Itself 293

Wait Is This Data Science? 294

Starting with a Simple Trade-Off 295

Representing the Problem as a Polytope 296

Solving by Sliding the Level Set 297

The Simplex Method: Rooting Around the Corners 298

Working in Excel 300

Fresh from the Grove to Your Glass with a Pit Stop Through a Blending Model 305

Let's Start with Some Specs 307

Coming Back to Consistency 308

Putting the Data into Excel 309

Setting Up the Problem in Solver 311

Lowering Your Standards 314

Dead Squirrel Removal: the Minimax Formulation 317

If-Then and the "Big M" Constraint 320

Multiplying Variables: Cranking Up the Volume to 11,000 324

Modeling Risk 330

Normally Distributed Data 331

10 Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant 339

Outliers Are (Bad?) People, Too 340

The Fascinating Case of Hadlum v Hadlum 340

Tukey's Fences 341

Applying Tukey's Fences in a Spreadsheet 342

The Limitations of This Simple Approach 345

Terrible at Nothing, Bad at Everything 346

Preparing Data for Graphing 347

Creating a Graph 350

Getting the k-Nearest Neighbors 351

Graph Outlier Detection Method 1: Just Use the Indegree 352

Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 355

Graph Outlier Detection Method 3: Local Outlier Factors Are Where It's At 358

11 Moving on From Spreadsheets 363

Getting Up and Running with R 364

A Crash Course in R-ing 366

Show Me the Numbers! Vector Math and Factoring 367

The Best Data Type of Them All: the Dataframe 370

How to Ask for Help in R 371

It Gets Even Better Beyond Base R 372

Doing Some Actual Data Science 374

Reading Data into R 374

Spherical K-Means on Wine Data in Just a Few Lines 375

Building AI Models on the Pregnancy Data 381

Forecasting in R 389

Looking at Outlier Detection 393

12 Conclusion 397

Where Am I? What Just Happened? 397

Before You Go-Go 397

Get to Know the Problem 398

We Need More Translators 398

Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection 399

You Are Not the Most Important Function of Your Organization 401

Get Creative and Keep in Touch! 402

Index 403

Data Smart

    Product form

    £30.39

    Includes FREE delivery

    RRP £37.99 – you save £7.60 (20%)

    Order before 4pm today for delivery by Fri 3 Jul 2026.

    A Paperback / softback by Jordan Goldmeier

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Data Smart by Jordan Goldmeier

      Publisher: John Wiley & Sons Inc
      Publication Date: 07/11/2023
      ISBN13: 9781119931386, 978-1119931386
      ISBN10: 111993138X
      Also in:
      Data warehousing

      Description

      Book Synopsis


      Table of Contents

      Introduction xix

      1 Everything You Ever Needed to Know About Spreadsheets but Were Too Afraid to Ask 1

      Some Sample Data 2

      Accessing Quick Descriptive Statistics 3

      Excel Tables 4

      Filtering and Sorting 5

      Table Formatting 7

      Structured References 7

      Adding Table Columns 10

      Lookup Formulas 11

      VLOOKUP 11

      INDEX/MATCH 13

      XLOOKUP 15

      PivotTables 16

      Using Array Formulas 19

      Solving Stuff with Solver 20

      2 Set It and Forget It: An Introduction to Power Query 27

      What Is Power Query? 27

      Sample Data 28

      Starting Power Query 29

      Filtering Rows 32

      Removing Columns 33

      Find & Replace 34

      Close & Load to Table 35

      3 Naïve Bayes and the Incredible Lightness of Being an Idiot 39

      The World's Fastest Intro to Probability Theory 39

      Totaling Conditional Probabilities 40

      Joint Probability, the Chain Rule, and Independence 40

      What Happens in a Dependent Situation? 41

      Bayes Rule 42

      Separating the Signal and the Noise 43

      Using the Bayes Rule to Create an AI Model 44

      High-Level Class Probabilities Are Often Assumed to Be Equal 45

      A Couple More Odds and Ends 46

      Let's Get This Excel Party Started 47

      Cleaning the Data with Power Query 48

      Splitting on Spaces: Giving Each Word Its Due 50

      Counting Tokens and Calculating Probabilities 55

      We Have a Model! Let's Use It 58

      4 Cluster Analysis Part 1: Using K-Means to Segment Your Customer Base 65

      Dances at Summer Camp 65

      Getting Real: K-Means Clustering Subscribers in Email Marketing 70

      The Initial Dataset 71

      Determining What to Measure 72

      Start with Four Clusters 75

      Euclidean Distance: Measuring Distances as the Crow Flies 76

      Solving for the Cluster Centers 80

      Making Sense of the Results 82

      Getting the Top Deals by Cluster 83

      The Silhouette: A Good Way to Let Different K Values Duke It Out 86

      How About Five Clusters? 95

      Solving for Five Clusters 96

      Getting the Top Deals for All Five Clusters 96

      Computing the Silhouette for 5-Means Clustering 99

      K-Medians Clustering and Asymmetric Distance Measurements 100

      Using K-Medians Clustering 100

      Getting a More Appropriate Distance Metric 100

      Putting It All in Excel 102

      The Top Deals for the 5-Medians Clusters 104

      5 Cluster Analysis Part II: Network Graphs and Community Detection 109

      What Is a Network Graph? 110

      Visualizing a Simple Graph 110

      Beyond GiGraph and Adjacency Lists 115

      Building a Graph from the Wholesale Wine Data 117

      Creating a Cosine Similarity Matrix 118

      Producing an R-Neighborhood Graph 121

      Introduction to Gephi 123

      Creating a Static Adjacency Matrix 124

      Bringing in Your R-Neighborhood Adjacency Matrix into Gephi 124

      Node Degree 128

      Touching the Graph Data 130

      How Much Is an Edge Worth? Points and Penalties in Graph Modularity 132

      What's a Point, and What's a Penalty? 133

      Setting Up the Score Sheet 136

      Let's Get Clustering! 138

      Split Number 1 138

      Split 2: Electric Boogaloo 143

      And. . .Split3: Split with a Vengeance 145

      Encoding and Analyzing the Communities 146

      There and Back Again: A Gephi Tale 151

      6 Regression: The Granddaddy of Supervised Artificial Intelligence 157

      Predicting Pregnant Customers at RetailMart Using Linear Regression 158

      The Feature Set 159

      Assembling the Training Data 161

      Creating Dummy Variables 163

      Let's Bake Our Own Linear Regression 165

      Linear Regression Statistics: R-Squared, F-Tests, t-Tests 173

      Making Predictions on Some New Data and Measuring Performance 182

      Predicting Pregnant Customers at RetailMart Using Logistic Regression 192

      First You Need a Link Function 192

      Hooking Up the Logistic Function and Reoptimizing 193

      Baking an Actual Logistic Regression 196

      7 Ensemble Models: A Whole Lot of Bad Pizza 203

      Getting Started Using the Data from Chapter 6 203

      Bagging: Randomize, Train, Repeat 204

      Decision Stump is Another Name for a Weak Learner 204

      Doesn't Seem So Weak to Me! 204

      You Need More Power! 207

      Let's Train It 208

      Evaluating the Bagged Model 220

      Boosting: If You Get It Wrong, Just Boost and Try Again 223

      Training the Model—Every Feature Gets a Shot 224

      Evaluating the Boosted Model 231

      8 Forecasting: Breathe Easy: You Can't Win 235

      The Sword Trade Is Hopping 236

      Getting Acquainted with Time-Series Data 236

      Starting Slow with Simple Exponential Smoothing 238

      Setting Up the Simple Exponential Smoothing Forecast 240

      You Might Have a Trend 249

      Holt's Trend-Corrected Exponential Smoothing 250

      Setting Up Holt's Trend-Corrected Smoothing in a Spreadsheet 252

      So Are You Done? Looking at Autocorrelations 258

      Multiplicative Holt-Winters Exponential Smoothing 266

      Setting the Initial Values for Level, Trend, and Seasonality 268

      Getting Rolling on the Forecast 274

      And. . .Optimize! 280

      Putting a Prediction Interval Around the Forecast 283

      Creating a Fan Chart for Effect 287

      Forecast Sheets in Excel 289

      9 Optimization Modeling: Because That "Fresh-Squeezed" Orange Juice Ain't Gonna Blend Itself 293

      Wait Is This Data Science? 294

      Starting with a Simple Trade-Off 295

      Representing the Problem as a Polytope 296

      Solving by Sliding the Level Set 297

      The Simplex Method: Rooting Around the Corners 298

      Working in Excel 300

      Fresh from the Grove to Your Glass with a Pit Stop Through a Blending Model 305

      Let's Start with Some Specs 307

      Coming Back to Consistency 308

      Putting the Data into Excel 309

      Setting Up the Problem in Solver 311

      Lowering Your Standards 314

      Dead Squirrel Removal: the Minimax Formulation 317

      If-Then and the "Big M" Constraint 320

      Multiplying Variables: Cranking Up the Volume to 11,000 324

      Modeling Risk 330

      Normally Distributed Data 331

      10 Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant 339

      Outliers Are (Bad?) People, Too 340

      The Fascinating Case of Hadlum v Hadlum 340

      Tukey's Fences 341

      Applying Tukey's Fences in a Spreadsheet 342

      The Limitations of This Simple Approach 345

      Terrible at Nothing, Bad at Everything 346

      Preparing Data for Graphing 347

      Creating a Graph 350

      Getting the k-Nearest Neighbors 351

      Graph Outlier Detection Method 1: Just Use the Indegree 352

      Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 355

      Graph Outlier Detection Method 3: Local Outlier Factors Are Where It's At 358

      11 Moving on From Spreadsheets 363

      Getting Up and Running with R 364

      A Crash Course in R-ing 366

      Show Me the Numbers! Vector Math and Factoring 367

      The Best Data Type of Them All: the Dataframe 370

      How to Ask for Help in R 371

      It Gets Even Better Beyond Base R 372

      Doing Some Actual Data Science 374

      Reading Data into R 374

      Spherical K-Means on Wine Data in Just a Few Lines 375

      Building AI Models on the Pregnancy Data 381

      Forecasting in R 389

      Looking at Outlier Detection 393

      12 Conclusion 397

      Where Am I? What Just Happened? 397

      Before You Go-Go 397

      Get to Know the Problem 398

      We Need More Translators 398

      Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection 399

      You Are Not the Most Important Function of Your Organization 401

      Get Creative and Keep in Touch! 402

      Index 403

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account