Description

Book Synopsis

This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.

With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:

• A treatment of random variables and expectations dealing primarily with the discrete case.

• A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.

• A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.

• A chapter dealing with classification, explaining why it’s useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.

• A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.

• A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis.

• A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.

Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as

boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know.

Instructor resources include a full set of model solutions for all problems, and an Instructor's Manual with accompanying presentation slides.



Table of Contents
1 Notation and conventions 9

1.0.1 Background Information........................................................................ 10

1.1 Acknowledgements................................................................................................. 11

I Describing Datasets

; 12

2 First Tools for Looking at Data 13

2.1 Datasets....................................................................................

................................... 13

2.2 What’s Happening? - Plotting Data................................................................. 15

2.2.1 Bar< Charts.................................................................................................... 16

2.2.2 Histograms................................................................................................... 16

2.2.3 How to Make Histograms...................................................................... 17

2.2.4 Conditional Histograms.......................................................................... 19

2.3 Summarizing 1D Data.................................

........................................................... 19

2.3.1 The Mean...................................................................................................... 20

2.3.2 Standard Deviation................................................................................... 22

2.3.3 Computing Mean and Standard Deviation Online...................... 26

2.3.4 Variance......................................................................................................... 26

2.3.5 The Median.................................................................................................. 27

2.3.6 Interqu

artile Range.................................................................................. 29

2.3.7 Using Summaries Sensibly.................................................................... 30

2.4 Plots and Summaries............................................................................................. 31

2.4.1 Some Properties of Histograms.......................................................... 31

2.4.2 Standard Coordinates and Normal Data......................................... 34

2.4.3 Box Plots....................................................................................................... 38

2.5 Whose is bigger? Inves

tigating Australian Pizzas...................................... 39

2.6 You should.................................................................................................................. 43

2.6.1 remember these definitions:................................................................. 43

2.6.2 remember these terms............................................................................ 43

2.6.3 remember these facts:............................................................................. 43

2.6.4 be able to...................................................................................................... 43

3 Looking at Relationships

47

3.1 Plotting 2D Data...................................................................................................... 47

3.1.1

3.1.2 Series...............................

............................................................................... 51

3.1.3 Scatter Plots for Spatial Data.............................................................. 53

3.1.4 Exposing Relationships with Scatter Plots..................................... 54

3.2 Correlation.................................................................................................................. 57

3.2.1 The Correlation Coefficient................................................................... 60

3.2.2 Using Correlation to Predict................................................................ 64

3.2.3 Confusion caused by co

rrelation......................................................... 68

1


<3.3 Sterile Males in Wild Horse Herds.................................................................. 68

3.4 You should.................................................................................................................. 72

3.4.1 remember these definitions:................................................................. 72

3.4.2 remember these terms............................................................................ 72

3.4.3

remember these facts: . .

. . .

3.4.4

use these procedures: . . .

. . .

3.4.5

be able to: . . . . . . . . .

. . .

. . . . . . . . . . . . . . . . . 72

. . . . . . . . . . . . . . . . . 72

. . . . . . . . . . . . . . . . . 72

II Probability &

nbsp; 78

4 Basic ideas in probability 79

4.1 Experiments, Outcomes and Probability....................................................... 79

4.1.1 Outcomes and Probability...................................................................... 79

4.2 Events.................

.......................................................................................................... 81

4.2.1 Computing Event Probabilities by Counting Outcomes............. 83

4.2.2 The Probability of Events...................................................................... 87

4.2.3 Computing Probabilities by Reasoning about Sets...................... 89

4.3 Independence............................................................................................................ 92

4.3.1 Example: Airline Overbooking............................................................ 96

4.4 Conditional ...........................................

............. 99

4.4.1 Evaluating Conditional Probabilities.............................................. 100

4.4.2 Detecting Rare Events is Hard......................................................... 104

4.4.3 Conditional Probability and Various Forms of Independence . 106 4.4.4 The Prosecutor’s Fallacy 108

4.4.5 Example: The Monty Hall Problem................................................ 110

4.5 Extra Worked Examples.................................................................................... 112

4.5.1 Outcomes and Probability........................................

........................... 112

4.5.2 Events.......................................................................................................... 114

4.5.3 Independence........................................................................................... 115

4.5.4 Conditional Probability......................................................................... 117

4.6 You should............................................................................................................... 121

4.6.1 remember these definitions:.............................................................. 121

4.6.2 remember these terms........

................................................................. 121

4.6.3 remember and use these facts.......................................................... 121

4.6.4 remember these points:....................................................................... 121

4.6.5 be able to.................................................................................................... 121

5 Random Variables and Expectations

128

5.1 Random Variables................................................................................................. 128

5.1.1 Joint and Conditional Probability for Random Variables . . . 131

5.1.2 Just a Little Continuous Probability............................................... 134

5.2 Expectations and Expected Values................................................................ 137

5.2.1 Expected Values...................................................................................... 138

5.2.2 Mean, Variance and Covariance....................................................... 141

5.2.3

Expectations and Statistics................................................................. 145

5.3 The Weak Law of Large Numbers................................................................ 145


5.3.1

IID Samples . . . . . . .

. .

. .

. . . .

. . . . .

. . .

. . . .

. 145

5.3.2

Two Inequalities . . . .

. .

. .

. . . .

. . . . .

. . .

. . . .<

. 146

5.3.3

Proving the Inequalities

. .

. .

. . . .

. . . . .

. . .

. . . .

. 147

5.3.4 The Weak Law of Large Numbers.................................................. 149

5.4 Using the Weak Law of Large Numbers 151

5.4.1 Should you accept a bet?..................................................................... 151

5.4.2 Odds, Expectations and Bookmaking — a Cultural Diversion 152 5.4.3 Ending a Game Early 154

5.4.4 Making a Decision with Decision Trees and

Expectations . . 154 5.4.5 Utility 156

5.5 You should................................................................................... 159

5.5.1 remember these definitions:.............................................................. 159

5.5.2 remember these terms......................................................................... 159

5.5.3 use and remember these facts.......................................................... 159

5.5.4 be able to.................................................................................................... 160

6 Useful Probability Distributions

; 167

6.1 Discrete Distributions 167

6.1.1 The Discrete Uniform Distribution................................................. 167

6.1.2 Bernoulli Random Variables..........................................................

..... 168

6.1.3 The Geometric Distribution................................................................ 168

6.1.4 The Binomial Probability Distribution........................................... 169

6.1.5 Multinomial probabilities..................................................................... 171

6.1.6 The Poisson Distribution..................................................................... 172

6.2 Continuous Distributions

; 174

6.2.1 The Continuous Uniform Distribution........................................... 174

6.2.2 The Beta Distribution........................................................................... 174

6.2.3 The Gamma Distribution..................................................................... 176

6.2.4 The Exponential Distribution............................................................ 176

6.3 The Normal Distribution ; 178

6.3.1 The Standard Normal Distribution................................................. 178

6.3.2 The Normal Distribution..................................................................... 179

6.3.3 Properties of The Normal Distribution......................................... 180

6.4 Approximating Binomials with Large N 182

6.4.1 Large N..............................................................

......................................... 183

6.4.2 Getting Normal<........................................................................................ 185

6.4.3 Using a Normal Approximation to the Binomial Distribution 187

6.5

You should . . . . . . . . . . . . .

. .

. . . .

. . . . .

. . .

. .

6.5.1 remember these definitions:

. .

. . . .

. . . . .

. . .

. .

6.5.2

remember these terms: .

. .

. .

. . . .

. . . . .

. . .

. .

6.5.3

remember these facts: .

. .

. .

. . . .

. . . . .

. . .

. .

6.5.4

remember these points:

. .

. .

.

. . .

. . . . .

. . .

. .<

. . . 188

. . . 188

. . . 188

. . . 188

. . . 188


III Inference

; 196

7 Samples and Populations 197

7.1 The Sample Mean................................................................................................. 197

7.1.1 The Sample Mean is an Estimate of the Population Mean . . 197

7.1.2 The Varianc

e of the Sample Mean.................................................. 198

7.1.3 When The Urn Model Works............................................................ 201

7.1.4 Distributions are Like Populations................................................. 202

7.2 Confidence Intervals............................................................................................ 203

7.2.1 Constructing Confidence Intervals.................................................. 203

7.2.2 Estimating the Variance of the Sample Mean............................ 204

7.2.3 The Probability Distribution of the Sample Mean..................... 206 &

lt;

7.2.4 Confidence Intervals for Population Means................................. 208

7.2.5 Standard Error Estimates from Simulation................................. 212

7.3 You should............................................................................................................... 216

7.3.1 remember these definitions:.............................................................. 216

7.3.2 remember these terms......................................................................... 216

7.3.3 remember these facts:........................................................................... 216

7.3.4 use these procedures............................................................................. 216

7.3.5 be able to.................................................................................................... 216

8 The Significance of Evidence 221

8.1 Significance..............................................................

................................................ 222

8.1.1 Evaluating Significance......................................................................... 223

8.1.2 P-values....................................................................................................... 225

8.2 Comparing the Mean of Two Populations.................................................. 230

8.2.1 Assuming Known Population Standard Deviations................... 231

8.2.2 Assuming Same, Unknown Population Standard Deviation . 233

8.2.3 Assuming Different, Unknown Population Stand

ard Deviation 235

8.3 Other Useful Tests of Significance................................................................. 237

8.3.1 F-tests and Standard Deviations...................................................... 237

8.3.2 χ2 Tests of Model Fit............................................................................ 239

8.4 Dangerous Behavior............................................................................................. 244

8.5 You should............................................................................................................... 246

8.5.1 remember these definitions:......................................

........................ 246

8.5.2 remember

8.5.3

remember these facts: . .

. . .

8.5.4

use these procedures: . . .

. . .

8.5.5

be able to: . . . . . . . . .

. . .

. . . . . . . . . . . . . . . . . 246

. . . . . . . . . . . . . . . . . 246

. . . . . . . . . . . . . . . . . 246

9 Experiments &nbs

p; 251

9.1 A Simple Experiment: The Effect of a Treatment.................................. 251

9.1.1 Randomized Balanced Experiments............................................... 252

9.1.2 Decomposing Error in Predictions.................................................. 253

9.1.3 Estimating the Noise Variance......................................................... 253

9.1.4 The ANOVA Table.................................................................................. 255

9.1.5 Unbalanced Experiments.................................................................... 257


9.1.6 Significant Differences.......................................................................... 259

9.2 Two Factor Experiments.................................................................................... 261

9.2.1 &n

bsp; Decomposing the Error........................................................................ 264

9.2.2 Interaction Between Effects................................................................ 265

9.2.3 The Effects of a Treatment................................................................. 266

9.2.4 Setting up an ANOVA Table.............................................................. 267

9.3 You should............................................................................................................... 272

9.3.1 remember these definitions:.............................................................. 272

9.3.2

remember these terms......................................................................... 272

9.3.3 remember these facts:........................................................................... 272

9.3.4 use these procedures............................................................................. 272

9.3.5 be able to.................................................................................................... 272

9.3.6 Two-Way Experiments.......................................................................... 274

10 Inferring Probability Models from Data &n

bsp; 275

10.1 Estimating Model Parameters with Maximum Likelihood.................. 275

10.1.1 The Maximum Likelihood Principle............................................... 277

10.1.2 Binomial, Geometric and Multinomial Distributions................ 278

10.1.3 Poisson and Normal Distributions................................................... 281

10.1.4 Confidence Intervals for Model Parameters................................ 286

10.1.5 Cautions about Maximum Likelihood............................................ 288

10.2 Incorporating Prio

rs with Bayesian Inference.......................................... 289

10.2.1 Conjugacy................................................................................................... 292

10.2.2 MAP Inference......................................................................................... 294

10.2.3 Cautions about Bayesian Inference................................................. 296

10.3 Bayesian Inference for Normal Distributions............................................ 296

10.3.1 Example: Measuring Depth of a Borehole................................... 296

10.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior 297

10.3.3 Filtering....................................

.................................................................. 300

10.4 You should............................................................................................................... 303

10.4.1 remember these definitions:.............................................................. 303

10.4.2 remember these terms......................................................................... 303

10.4.3 remember these facts:........................................................................... 304

10.4.4 use these procedures............................................................................. 304

10.4.5 be able to.................................................................................................... 304

&

lt;

IV Tools 312

11 Extracting Important Relationships in High Dimensions 313

11.1 Summaries and Simple Plots........................................................................... 313

11.1.1 The Mean......................

............................................................................. 314

11.1.2 Stem Plots and Scatterplot Matrices.............................................. 315

11.1.3 Covariance.................................................................................................. 317

11.1.4 The Covariance Matrix......................................................................... 319

11.2 Using Mean and Covariance to Understand High Dimensional Data . 321

11.2.1 Mean and Covariance under Affine Transformations............... 322


11.2.2

. .

324

. .

325

. .

326

. .

327

. .

329

.

332

. .

334

. .

335

. .

335

. .

338

. .

339

. .

341

. .

<

345

. .

345

. .

345

. .

345

. .

345

. .

345

349

. .

349

. .

350

. .

350

. .

351

. .

351

. .

353

. .

355

. .

357

. .

358

. .

359

. .

<360

.< .

361

<

Eigenvectors and Diagonalization . . . . . . . . . . . . . .

11.2.3 Diagonalizing Covariance by Rotating Blobs . . . . . . . .

11.2.4 Approximating Blobs . . . . . . . . . . . . . . . . . . . .

11.2.5 Example: Transforming the Height-Weight Blob . . . . .

11.3 Principal Components Analysis . . . . . . . . . . . . . . . . . . .

11.3.1 Example: Representing Colors with Principal Components

11.3.2 Example: Representing Faces

with Principal Components

11.4 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . .

11.4.1 Choosing Low D Points using High D Distances . . . . . .

11.4.2 Factoring a Dot-Product Matrix . . . . . . . . . . . . . .

11.4.3 Example: Mapping with Multidimensional Scaling . . . .

11.5 Example: Understanding Height and Weight . . . . . . . . . . . 11.6 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11.6.1 remember these definitions: . . . . . . . . . . . . . . . . .

11.6.2 remember these terms: . . . . . . . . . . . . . . . . . . . .

11.6.3 remember these facts: .&

nbsp; . . . . . . . . . . . . . . . . . . .

11.6.4 use these procedures: . . . . . . . . . . . . . . . . . . . . . 11.6.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 Learning to Classify

12.1 Classification: The Big Ideas . . . . . . . . . . . . . . . . . . . . 12.1.1 The Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . .

12.1.4 Is the Classifier Working Well? . . . . . . . . . . . . . . .

12.2 Classifying with Nearest Neighbors . . . . .

. . . . . . . . . . . .

12.3 Classifying with Naive Bayes . . . . . . . . . . . . . . . . . . . . 12.3.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . .

12.4 The Support

12.4.1 Choosing a Classifier with the Hinge Loss . . . . . . . . .

12.4.2 Finding a Minimum: General Points . . . . . . . . . . . .

12.4.3 Finding a Minimum: Stochastic Gradient Descent . . . .

12.4.4 Example: Training an SVM with Stochastic Gradient Descent 363

12.4.5 Multi-Class Classification with SVMs.............................................. 366

12.5 Classifying with Random Forests................................................................... 367

12.5.1 Building a Decision Tree..................................................................... 367

12.5.2 Choosing a Split with Information Gain........................................ 370

12.5.3 Forests......................................................................................................... 373

12.5.4 Building and Evaluating a Decision Forest.................................. 374

12.5.5 Classifying Data Items with a Decision Forest........................... 375

12.6 You should...................................................................

............................................ 378

12.6.1 remember these definitions:.............................................................. 378

12.6.2 remember these terms......................................................................... 378

12.6.3 remember these facts:........................................................................... 379

12.6.4 use these procedures............................................................................. 379

12.6.5 be able to.................................................................................................... 379


<

13.1 The Curse of Dimension........................................................................

............. 384

13.1.1 The Curse: Data isn’t Where You Think it is............................. 384

13.1.2 Minor Banes of Dimension.................................................................. 386

13.2 The Multivariate Normal Distribution......................................................... 387

13.2.1 Affine Transformations and Gaussians.......................................... 387

13.2.2 Plotting a 2D Gaussian: Covariance Ellipses.............................. 388

13.3 Agglomerative and Divisive Clustering........................................................ 389

13.3.1 Clustering and Distance....................................................................... 391

13.4 &

nbsp; The K-Means Algorithm and Variants......................................................... 392

13.4.1 How to choose K...................................................................................... 395

13.4.2 Soft Assignment....................................................................................... 397

13.4.3 General Comments on K-Means....................................................... 400

13.4.4 K-Mediods.................................................................................................. 400

13.5 Application Example: Clustering Documents........................................... 401

13.5.1 A Topic Model......................................................................................

.... 402

13.6 Describing Repetition with Vector Quantization...................................... 403

13.6.1 Vector Quantization............................................................................... 404

13.6.2 Example: Groceries in Portugal....................................................... 406

13.6.3 Efficient Clustering and Hierarchical K Means.......................... 409

13.6.4 Example: Activity from Accelerometer Data............................... 409

13.7 You should............................................................................................................... 413

13.7.1 remember these definitions:.............................................................. 413

13.7.2 remember these terms......................................................................... 413

13.7.3 remember these facts:........................................................................... 413

13.7.4 use these procedures............................................................................. 413

14 Regression &nbs

p; 417

14.1.1 Regression to Make Predictions....................................................... 417

14.1.2 Regression to Spot Trends.................................................................. 419

14.1 Linear Regression and Least Squares.......................................................... 421

14.1.1 Linear Regression................................................................................... 421

14.1.2 Choosing β.................................................................................................. 422

14.1.3 Solving the Least Squares Problem................................................ 423

14.1.4 &n

bsp; Residuals..................................................................................................... 424

14.1.5 R-squared.................................................................................................... 424

14.2 Producing Good Linear Regressions............................................................. 427

14.2.1 Transforming Variables........................................................................ 428

14.2.2 Problem Data Points have Significant Impact............................ 431

14.2.3 Functions of One Explanatory Variable........................................ 433

14.2.4 Regularizing Linear Regressions...................................................... 435

14.3 &nbs

p; Exploiting Your Neighbors

14.3.1 Using your Neighbors to Predict More than a Number............ 441

14.3.2 Example: Filling Large Holes with Whole Images.................... 441

14.4

You should . . . . . . . . . . . . .

. .

. . . .

. . . . .

. . .

14.4.1 remember these definitions:

. .

. . . .

. . .

. .

. . .

14.4.2 remember these terms: . . .

. .

. . . .

. . . . .

. . .

. . . . . 444

. . . . . 444

. . . . . 444


14.4.3 remember these facts:........................................................................... 444

14.4.4 remember these procedures:............................................................. 444

15 Markov Chains and Hidden Markov Models &n

bsp; 454

15.1 Markov Chains........................................................................................................ 454

15.1.1 Transition Probability Matrices........................................................ 457

15.1.2 Stationary Distributions....................................................................... 459

15.1.3 Example: Markov Chain Models of Text...................................... 462

15.2 Estimating Properties of Markov Chains.................................................... 465

15.2.1 Simulation....................................................

.............................................. 465

15.2.2 Simulation Results as Random Variables..................................... 467

15.2.3 Simulating Markov Chains.................................................................. 469

15.3 Example: Ranking the Web by Simulating a Markov Chain................ 472

15.4 Hidden Markov Models and Dynamic Programming............................. 473

15.4.1 Hidden Markov Models........................................................................ 474

15.4.2 Picturing Inference with a Trellis.................................................... 474

15.4.3 Dynamic Programming for HMM’s: Formalities....................... 478

15.4.4 &nb

sp; Example: Simple Communication Errors..................................... 478

15.5 You should............................................................................................................... 481

15.5.1 remember these definitions:.............................................................. 481

15.5.2 remember these terms......................................................................... 481

15.5.3 remember these facts:........................................................................... 481

15.5.4 be able to.................................................................................................... 481

V Some Mathematical Background &nb

sp; 484

16 Resources 485

16.1 Useful

Material about Matrices....................................................................... 485

16.1.1 The Singular Value Decomposition................................................. 486

16.1.2 Approximating A Symmetric Matrix............................................... 487

16.2 Some Special Functions..................................................................................... 489

16.3 Finding Nearest Neighbors............................................................................... 490

16.4 Entropy and Information Gain........................................................................ 493

Probability and Statistics for Computer Science

Product form

£40.49

Includes FREE delivery

RRP £44.99 – you save £4.50 (10%)

Order before 4pm tomorrow for delivery by Sat 20 Dec 2025.

A Paperback / softback by David Forsyth

Out of stock


    View other formats and editions of Probability and Statistics for Computer Science by David Forsyth

    Publisher: Springer International Publishing AG
    Publication Date: 04/06/2019
    ISBN13: 9783319877884, 978-3319877884
    ISBN10: 3319877887

    Description

    Book Synopsis

    This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.

    With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:

    • A treatment of random variables and expectations dealing primarily with the discrete case.

    • A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.

    • A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.

    • A chapter dealing with classification, explaining why it’s useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.

    • A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.

    • A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis.

    • A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.

    Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as

    boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know.

    Instructor resources include a full set of model solutions for all problems, and an Instructor's Manual with accompanying presentation slides.



    Table of Contents
    1 Notation and conventions 9

    1.0.1 Background Information........................................................................ 10

    1.1 Acknowledgements................................................................................................. 11

    I Describing Datasets

    ; 12

    2 First Tools for Looking at Data 13

    2.1 Datasets....................................................................................

    ................................... 13

    2.2 What’s Happening? - Plotting Data................................................................. 15

    2.2.1 Bar< Charts.................................................................................................... 16

    2.2.2 Histograms................................................................................................... 16

    2.2.3 How to Make Histograms...................................................................... 17

    2.2.4 Conditional Histograms.......................................................................... 19

    2.3 Summarizing 1D Data.................................

    ........................................................... 19

    2.3.1 The Mean...................................................................................................... 20

    2.3.2 Standard Deviation................................................................................... 22

    2.3.3 Computing Mean and Standard Deviation Online...................... 26

    2.3.4 Variance......................................................................................................... 26

    2.3.5 The Median.................................................................................................. 27

    2.3.6 Interqu

    artile Range.................................................................................. 29

    2.3.7 Using Summaries Sensibly.................................................................... 30

    2.4 Plots and Summaries............................................................................................. 31

    2.4.1 Some Properties of Histograms.......................................................... 31

    2.4.2 Standard Coordinates and Normal Data......................................... 34

    2.4.3 Box Plots....................................................................................................... 38

    2.5 Whose is bigger? Inves

    tigating Australian Pizzas...................................... 39

    2.6 You should.................................................................................................................. 43

    2.6.1 remember these definitions:................................................................. 43

    2.6.2 remember these terms............................................................................ 43

    2.6.3 remember these facts:............................................................................. 43

    2.6.4 be able to...................................................................................................... 43

    3 Looking at Relationships

    47

    3.1 Plotting 2D Data...................................................................................................... 47

    3.1.1

    3.1.2 Series...............................

    ............................................................................... 51

    3.1.3 Scatter Plots for Spatial Data.............................................................. 53

    3.1.4 Exposing Relationships with Scatter Plots..................................... 54

    3.2 Correlation.................................................................................................................. 57

    3.2.1 The Correlation Coefficient................................................................... 60

    3.2.2 Using Correlation to Predict................................................................ 64

    3.2.3 Confusion caused by co

    rrelation......................................................... 68

    1


    <3.3 Sterile Males in Wild Horse Herds.................................................................. 68

    3.4 You should.................................................................................................................. 72

    3.4.1 remember these definitions:................................................................. 72

    3.4.2 remember these terms............................................................................ 72

    3.4.3

    remember these facts: . .

    . . .

    3.4.4

    use these procedures: . . .

    . . .

    3.4.5

    be able to: . . . . . . . . .

    . . .

    . . . . . . . . . . . . . . . . . 72

    . . . . . . . . . . . . . . . . . 72

    . . . . . . . . . . . . . . . . . 72

    II Probability &

    nbsp; 78

    4 Basic ideas in probability 79

    4.1 Experiments, Outcomes and Probability....................................................... 79

    4.1.1 Outcomes and Probability...................................................................... 79

    4.2 Events.................

    .......................................................................................................... 81

    4.2.1 Computing Event Probabilities by Counting Outcomes............. 83

    4.2.2 The Probability of Events...................................................................... 87

    4.2.3 Computing Probabilities by Reasoning about Sets...................... 89

    4.3 Independence............................................................................................................ 92

    4.3.1 Example: Airline Overbooking............................................................ 96

    4.4 Conditional ...........................................

    ............. 99

    4.4.1 Evaluating Conditional Probabilities.............................................. 100

    4.4.2 Detecting Rare Events is Hard......................................................... 104

    4.4.3 Conditional Probability and Various Forms of Independence . 106 4.4.4 The Prosecutor’s Fallacy 108

    4.4.5 Example: The Monty Hall Problem................................................ 110

    4.5 Extra Worked Examples.................................................................................... 112

    4.5.1 Outcomes and Probability........................................

    ........................... 112

    4.5.2 Events.......................................................................................................... 114

    4.5.3 Independence........................................................................................... 115

    4.5.4 Conditional Probability......................................................................... 117

    4.6 You should............................................................................................................... 121

    4.6.1 remember these definitions:.............................................................. 121

    4.6.2 remember these terms........

    ................................................................. 121

    4.6.3 remember and use these facts.......................................................... 121

    4.6.4 remember these points:....................................................................... 121

    4.6.5 be able to.................................................................................................... 121

    5 Random Variables and Expectations

    128

    5.1 Random Variables................................................................................................. 128

    5.1.1 Joint and Conditional Probability for Random Variables . . . 131

    5.1.2 Just a Little Continuous Probability............................................... 134

    5.2 Expectations and Expected Values................................................................ 137

    5.2.1 Expected Values...................................................................................... 138

    5.2.2 Mean, Variance and Covariance....................................................... 141

    5.2.3

    Expectations and Statistics................................................................. 145

    5.3 The Weak Law of Large Numbers................................................................ 145


    5.3.1

    IID Samples . . . . . . .

    . .

    . .

    . . . .

    . . . . .

    . . .

    . . . .

    . 145

    5.3.2

    Two Inequalities . . . .

    . .

    . .

    . . . .

    . . . . .

    . . .

    . . . .<

    . 146

    5.3.3

    Proving the Inequalities

    . .

    . .

    . . . .

    . . . . .

    . . .

    . . . .

    . 147

    5.3.4 The Weak Law of Large Numbers.................................................. 149

    5.4 Using the Weak Law of Large Numbers 151

    5.4.1 Should you accept a bet?..................................................................... 151

    5.4.2 Odds, Expectations and Bookmaking — a Cultural Diversion 152 5.4.3 Ending a Game Early 154

    5.4.4 Making a Decision with Decision Trees and

    Expectations . . 154 5.4.5 Utility 156

    5.5 You should................................................................................... 159

    5.5.1 remember these definitions:.............................................................. 159

    5.5.2 remember these terms......................................................................... 159

    5.5.3 use and remember these facts.......................................................... 159

    5.5.4 be able to.................................................................................................... 160

    6 Useful Probability Distributions

    ; 167

    6.1 Discrete Distributions 167

    6.1.1 The Discrete Uniform Distribution................................................. 167

    6.1.2 Bernoulli Random Variables..........................................................

    ..... 168

    6.1.3 The Geometric Distribution................................................................ 168

    6.1.4 The Binomial Probability Distribution........................................... 169

    6.1.5 Multinomial probabilities..................................................................... 171

    6.1.6 The Poisson Distribution..................................................................... 172

    6.2 Continuous Distributions

    ; 174

    6.2.1 The Continuous Uniform Distribution........................................... 174

    6.2.2 The Beta Distribution........................................................................... 174

    6.2.3 The Gamma Distribution..................................................................... 176

    6.2.4 The Exponential Distribution............................................................ 176

    6.3 The Normal Distribution ; 178

    6.3.1 The Standard Normal Distribution................................................. 178

    6.3.2 The Normal Distribution..................................................................... 179

    6.3.3 Properties of The Normal Distribution......................................... 180

    6.4 Approximating Binomials with Large N 182

    6.4.1 Large N..............................................................

    ......................................... 183

    6.4.2 Getting Normal<........................................................................................ 185

    6.4.3 Using a Normal Approximation to the Binomial Distribution 187

    6.5

    You should . . . . . . . . . . . . .

    . .

    . . . .

    . . . . .

    . . .

    . .

    6.5.1 remember these definitions:

    . .

    . . . .

    . . . . .

    . . .

    . .

    6.5.2

    remember these terms: .

    . .

    . .

    . . . .

    . . . . .

    . . .

    . .

    6.5.3

    remember these facts: .

    . .

    . .

    . . . .

    . . . . .

    . . .

    . .

    6.5.4

    remember these points:

    . .

    . .

    .

    . . .

    . . . . .

    . . .

    . .<

    . . . 188

    . . . 188

    . . . 188

    . . . 188

    . . . 188


    III Inference

    ; 196

    7 Samples and Populations 197

    7.1 The Sample Mean................................................................................................. 197

    7.1.1 The Sample Mean is an Estimate of the Population Mean . . 197

    7.1.2 The Varianc

    e of the Sample Mean.................................................. 198

    7.1.3 When The Urn Model Works............................................................ 201

    7.1.4 Distributions are Like Populations................................................. 202

    7.2 Confidence Intervals............................................................................................ 203

    7.2.1 Constructing Confidence Intervals.................................................. 203

    7.2.2 Estimating the Variance of the Sample Mean............................ 204

    7.2.3 The Probability Distribution of the Sample Mean..................... 206 &

    lt;

    7.2.4 Confidence Intervals for Population Means................................. 208

    7.2.5 Standard Error Estimates from Simulation................................. 212

    7.3 You should............................................................................................................... 216

    7.3.1 remember these definitions:.............................................................. 216

    7.3.2 remember these terms......................................................................... 216

    7.3.3 remember these facts:........................................................................... 216

    7.3.4 use these procedures............................................................................. 216

    7.3.5 be able to.................................................................................................... 216

    8 The Significance of Evidence 221

    8.1 Significance..............................................................

    ................................................ 222

    8.1.1 Evaluating Significance......................................................................... 223

    8.1.2 P-values....................................................................................................... 225

    8.2 Comparing the Mean of Two Populations.................................................. 230

    8.2.1 Assuming Known Population Standard Deviations................... 231

    8.2.2 Assuming Same, Unknown Population Standard Deviation . 233

    8.2.3 Assuming Different, Unknown Population Stand

    ard Deviation 235

    8.3 Other Useful Tests of Significance................................................................. 237

    8.3.1 F-tests and Standard Deviations...................................................... 237

    8.3.2 χ2 Tests of Model Fit............................................................................ 239

    8.4 Dangerous Behavior............................................................................................. 244

    8.5 You should............................................................................................................... 246

    8.5.1 remember these definitions:......................................

    ........................ 246

    8.5.2 remember

    8.5.3

    remember these facts: . .

    . . .

    8.5.4

    use these procedures: . . .

    . . .

    8.5.5

    be able to: . . . . . . . . .

    . . .

    . . . . . . . . . . . . . . . . . 246

    . . . . . . . . . . . . . . . . . 246

    . . . . . . . . . . . . . . . . . 246

    9 Experiments &nbs

    p; 251

    9.1 A Simple Experiment: The Effect of a Treatment.................................. 251

    9.1.1 Randomized Balanced Experiments............................................... 252

    9.1.2 Decomposing Error in Predictions.................................................. 253

    9.1.3 Estimating the Noise Variance......................................................... 253

    9.1.4 The ANOVA Table.................................................................................. 255

    9.1.5 Unbalanced Experiments.................................................................... 257


    9.1.6 Significant Differences.......................................................................... 259

    9.2 Two Factor Experiments.................................................................................... 261

    9.2.1 &n

    bsp; Decomposing the Error........................................................................ 264

    9.2.2 Interaction Between Effects................................................................ 265

    9.2.3 The Effects of a Treatment................................................................. 266

    9.2.4 Setting up an ANOVA Table.............................................................. 267

    9.3 You should............................................................................................................... 272

    9.3.1 remember these definitions:.............................................................. 272

    9.3.2

    remember these terms......................................................................... 272

    9.3.3 remember these facts:........................................................................... 272

    9.3.4 use these procedures............................................................................. 272

    9.3.5 be able to.................................................................................................... 272

    9.3.6 Two-Way Experiments.......................................................................... 274

    10 Inferring Probability Models from Data &n

    bsp; 275

    10.1 Estimating Model Parameters with Maximum Likelihood.................. 275

    10.1.1 The Maximum Likelihood Principle............................................... 277

    10.1.2 Binomial, Geometric and Multinomial Distributions................ 278

    10.1.3 Poisson and Normal Distributions................................................... 281

    10.1.4 Confidence Intervals for Model Parameters................................ 286

    10.1.5 Cautions about Maximum Likelihood............................................ 288

    10.2 Incorporating Prio

    rs with Bayesian Inference.......................................... 289

    10.2.1 Conjugacy................................................................................................... 292

    10.2.2 MAP Inference......................................................................................... 294

    10.2.3 Cautions about Bayesian Inference................................................. 296

    10.3 Bayesian Inference for Normal Distributions............................................ 296

    10.3.1 Example: Measuring Depth of a Borehole................................... 296

    10.3.2 Normal Prior and Normal Likelihood Yield Normal Posterior 297

    10.3.3 Filtering....................................

    .................................................................. 300

    10.4 You should............................................................................................................... 303

    10.4.1 remember these definitions:.............................................................. 303

    10.4.2 remember these terms......................................................................... 303

    10.4.3 remember these facts:........................................................................... 304

    10.4.4 use these procedures............................................................................. 304

    10.4.5 be able to.................................................................................................... 304

    &

    lt;

    IV Tools 312

    11 Extracting Important Relationships in High Dimensions 313

    11.1 Summaries and Simple Plots........................................................................... 313

    11.1.1 The Mean......................

    ............................................................................. 314

    11.1.2 Stem Plots and Scatterplot Matrices.............................................. 315

    11.1.3 Covariance.................................................................................................. 317

    11.1.4 The Covariance Matrix......................................................................... 319

    11.2 Using Mean and Covariance to Understand High Dimensional Data . 321

    11.2.1 Mean and Covariance under Affine Transformations............... 322


    11.2.2

    . .

    324

    . .

    325

    . .

    326

    . .

    327

    . .

    329

    .

    332

    . .

    334

    . .

    335

    . .

    335

    . .

    338

    . .

    339

    . .

    341

    . .

    <

    345

    . .

    345

    . .

    345

    . .

    345

    . .

    345

    . .

    345

    349

    . .

    349

    . .

    350

    . .

    350

    . .

    351

    . .

    351

    . .

    353

    . .

    355

    . .

    357

    . .

    358

    . .

    359

    . .

    <360

    .< .

    361

    <

    Eigenvectors and Diagonalization . . . . . . . . . . . . . .

    11.2.3 Diagonalizing Covariance by Rotating Blobs . . . . . . . .

    11.2.4 Approximating Blobs . . . . . . . . . . . . . . . . . . . .

    11.2.5 Example: Transforming the Height-Weight Blob . . . . .

    11.3 Principal Components Analysis . . . . . . . . . . . . . . . . . . .

    11.3.1 Example: Representing Colors with Principal Components

    11.3.2 Example: Representing Faces

    with Principal Components

    11.4 Multi-Dimensional Scaling . . . . . . . . . . . . . . . . . . . . . .

    11.4.1 Choosing Low D Points using High D Distances . . . . . .

    11.4.2 Factoring a Dot-Product Matrix . . . . . . . . . . . . . .

    11.4.3 Example: Mapping with Multidimensional Scaling . . . .

    11.5 Example: Understanding Height and Weight . . . . . . . . . . . 11.6 You should . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    11.6.1 remember these definitions: . . . . . . . . . . . . . . . . .

    11.6.2 remember these terms: . . . . . . . . . . . . . . . . . . . .

    11.6.3 remember these facts: .&

    nbsp; . . . . . . . . . . . . . . . . . . .

    11.6.4 use these procedures: . . . . . . . . . . . . . . . . . . . . . 11.6.5 be able to: . . . . . . . . . . . . . . . . . . . . . . . . . . .

    12 Learning to Classify

    12.1 Classification: The Big Ideas . . . . . . . . . . . . . . . . . . . . 12.1.1 The Error Rate . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . .

    12.1.4 Is the Classifier Working Well? . . . . . . . . . . . . . . .

    12.2 Classifying with Nearest Neighbors . . . . .

    . . . . . . . . . . . .

    12.3 Classifying with Naive Bayes . . . . . . . . . . . . . . . . . . . . 12.3.1 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . .

    12.4 The Support

    12.4.1 Choosing a Classifier with the Hinge Loss . . . . . . . . .

    12.4.2 Finding a Minimum: General Points . . . . . . . . . . . .

    12.4.3 Finding a Minimum: Stochastic Gradient Descent . . . .

    12.4.4 Example: Training an SVM with Stochastic Gradient Descent 363

    12.4.5 Multi-Class Classification with SVMs.............................................. 366

    12.5 Classifying with Random Forests................................................................... 367

    12.5.1 Building a Decision Tree..................................................................... 367

    12.5.2 Choosing a Split with Information Gain........................................ 370

    12.5.3 Forests......................................................................................................... 373

    12.5.4 Building and Evaluating a Decision Forest.................................. 374

    12.5.5 Classifying Data Items with a Decision Forest........................... 375

    12.6 You should...................................................................

    ............................................ 378

    12.6.1 remember these definitions:.............................................................. 378

    12.6.2 remember these terms......................................................................... 378

    12.6.3 remember these facts:........................................................................... 379

    12.6.4 use these procedures............................................................................. 379

    12.6.5 be able to.................................................................................................... 379


    <

    13.1 The Curse of Dimension........................................................................

    ............. 384

    13.1.1 The Curse: Data isn’t Where You Think it is............................. 384

    13.1.2 Minor Banes of Dimension.................................................................. 386

    13.2 The Multivariate Normal Distribution......................................................... 387

    13.2.1 Affine Transformations and Gaussians.......................................... 387

    13.2.2 Plotting a 2D Gaussian: Covariance Ellipses.............................. 388

    13.3 Agglomerative and Divisive Clustering........................................................ 389

    13.3.1 Clustering and Distance....................................................................... 391

    13.4 &

    nbsp; The K-Means Algorithm and Variants......................................................... 392

    13.4.1 How to choose K...................................................................................... 395

    13.4.2 Soft Assignment....................................................................................... 397

    13.4.3 General Comments on K-Means....................................................... 400

    13.4.4 K-Mediods.................................................................................................. 400

    13.5 Application Example: Clustering Documents........................................... 401

    13.5.1 A Topic Model......................................................................................

    .... 402

    13.6 Describing Repetition with Vector Quantization...................................... 403

    13.6.1 Vector Quantization............................................................................... 404

    13.6.2 Example: Groceries in Portugal....................................................... 406

    13.6.3 Efficient Clustering and Hierarchical K Means.......................... 409

    13.6.4 Example: Activity from Accelerometer Data............................... 409

    13.7 You should............................................................................................................... 413

    13.7.1 remember these definitions:.............................................................. 413

    13.7.2 remember these terms......................................................................... 413

    13.7.3 remember these facts:........................................................................... 413

    13.7.4 use these procedures............................................................................. 413

    14 Regression &nbs

    p; 417

    14.1.1 Regression to Make Predictions....................................................... 417

    14.1.2 Regression to Spot Trends.................................................................. 419

    14.1 Linear Regression and Least Squares.......................................................... 421

    14.1.1 Linear Regression................................................................................... 421

    14.1.2 Choosing β.................................................................................................. 422

    14.1.3 Solving the Least Squares Problem................................................ 423

    14.1.4 &n

    bsp; Residuals..................................................................................................... 424

    14.1.5 R-squared.................................................................................................... 424

    14.2 Producing Good Linear Regressions............................................................. 427

    14.2.1 Transforming Variables........................................................................ 428

    14.2.2 Problem Data Points have Significant Impact............................ 431

    14.2.3 Functions of One Explanatory Variable........................................ 433

    14.2.4 Regularizing Linear Regressions...................................................... 435

    14.3 &nbs

    p; Exploiting Your Neighbors

    14.3.1 Using your Neighbors to Predict More than a Number............ 441

    14.3.2 Example: Filling Large Holes with Whole Images.................... 441

    14.4

    You should . . . . . . . . . . . . .

    . .

    . . . .

    . . . . .

    . . .

    14.4.1 remember these definitions:

    . .

    . . . .

    . . .

    . .

    . . .

    14.4.2 remember these terms: . . .

    . .

    . . . .

    . . . . .

    . . .

    . . . . . 444

    . . . . . 444

    . . . . . 444


    14.4.3 remember these facts:........................................................................... 444

    14.4.4 remember these procedures:............................................................. 444

    15 Markov Chains and Hidden Markov Models &n

    bsp; 454

    15.1 Markov Chains........................................................................................................ 454

    15.1.1 Transition Probability Matrices........................................................ 457

    15.1.2 Stationary Distributions....................................................................... 459

    15.1.3 Example: Markov Chain Models of Text...................................... 462

    15.2 Estimating Properties of Markov Chains.................................................... 465

    15.2.1 Simulation....................................................

    .............................................. 465

    15.2.2 Simulation Results as Random Variables..................................... 467

    15.2.3 Simulating Markov Chains.................................................................. 469

    15.3 Example: Ranking the Web by Simulating a Markov Chain................ 472

    15.4 Hidden Markov Models and Dynamic Programming............................. 473

    15.4.1 Hidden Markov Models........................................................................ 474

    15.4.2 Picturing Inference with a Trellis.................................................... 474

    15.4.3 Dynamic Programming for HMM’s: Formalities....................... 478

    15.4.4 &nb

    sp; Example: Simple Communication Errors..................................... 478

    15.5 You should............................................................................................................... 481

    15.5.1 remember these definitions:.............................................................. 481

    15.5.2 remember these terms......................................................................... 481

    15.5.3 remember these facts:........................................................................... 481

    15.5.4 be able to.................................................................................................... 481

    V Some Mathematical Background &nb

    sp; 484

    16 Resources 485

    16.1 Useful

    Material about Matrices....................................................................... 485

    16.1.1 The Singular Value Decomposition................................................. 486

    16.1.2 Approximating A Symmetric Matrix............................................... 487

    16.2 Some Special Functions..................................................................................... 489

    16.3 Finding Nearest Neighbors............................................................................... 490

    16.4 Entropy and Information Gain........................................................................ 493

    Recently viewed products

    © 2025 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account