Description

Book Synopsis


Table of Contents

About the Authors vii

About the Technical Editors ix

Acknowledgments xi

Introduction xxi

Part I: Getting Started 1

Chapter 1 What is Machine Learning? 3

Discovering Knowledge in Data 5

Introducing Algorithms 5

Artificial Intelligence, Machine Learning, and Deep Learning 6

Machine Learning Techniques 7

Supervised Learning 8

Unsupervised Learning 12

Model Selection 14

Classification Techniques 14

Regression Techniques 15

Similarity Learning Techniques 16

Model Evaluation 16

Classification Errors 17

Regression Errors 19

Types of Error 20

Partitioning Datasets 22

Holdout Method 23

Cross-Validation Methods 23

Exercises 24

Chapter 2 Introduction to R and RStudio 25

Welcome to R 26

R and RStudio Components 27

The R Language 27

RStudio 28

RStudio Desktop 28

RStudio Server 29

Exploring the RStudio

Environment 29

R Packages 38

The CRAN Repository 38

Installing Packages 38

Loading Packages 39

Package Documentation 40

Writing and Running an R Script 41

Data Types in R 44

Vectors 45

Testing Data Types 47

Converting Data Types 50

Missing Values 51

Exercises 52

Chapter 3 Managing Data 53

The Tidyverse 54

Data Collection 55

Key Considerations 55

Collecting Ground Truth Data 55

Data Relevance 55

Quantity of Data 56

Ethics 56

Importing the Data 56

Reading Comma-Delimited Files 56

Reading Other Delimited Files 60

Data Exploration 60

Describing the Data 61

Instance 61

Feature 61

Dimensionality 62

Sparsity and Density 62

Resolution 62

Descriptive Statistics 63

Visualizing the Data 69

Comparison 69

Relationship 70

Distribution 72

Composition 73

Data Preparation 74

Cleaning the Data 75

Missing Values 75

Noise 79

Outliers 81

Class Imbalance 82

Transforming the Data 84

Normalization 84

Discretization 89

Dummy Coding 89

Reducing the Data 92

Sampling 92

Dimensionality Reduction 99

Exercises 100

Part II: Regression 101

Chapter 4 Linear Regression 103

Bicycle Rentals and Regression 104

Relationships Between Variables 106

Correlation 106

Regression 114

Simple Linear Regression 115

Ordinary Least Squares Method 116

Simple Linear Regression Model 119

Evaluating the Model 120

Residuals 121

Coefficients 121

Diagnostics 122

Multiple Linear Regression 124

The Multiple Linear Regression Model 124

Evaluating the Model 125

Residual Diagnostics 127

Influential Point Analysis 130

Multicollinearity 133

Improving the Model 135

Considering Nonlinear Relationships 135

Considering Categorical Variables 137

Considering Interactions Between Variables 139

Selecting the Important Variables 141

Strengths and Weaknesses 146

Case Study: Predicting Blood Pressure 147

Importing the Data 148

Exploring the Data 149

Fitting the Simple Linear Regression Model 151

Fitting the Multiple Linear Regression Model 152

Exercises 161

Chapter 5 Logistic Regression 165

Prospecting for Potential Donors 166

Classifi cation 169

Logistic Regression 170

Odds Ratio 172

Binomial Logistic Regression Model 176

Dealing with Missing Data 178

Dealing with Outliers 182

Splitting the Data 187

Dealing with Class Imbalance 188

Training a Model 190

Evaluating the Model 190

Coeffi cients 193

Diagnostics 195

Predictive Accuracy 195

Improving the Model 198

Dealing with Multicollinearity 198

Choosing a Cutoff Value 205

Strengths and Weaknesses 206

Case Study: Income Prediction 207

Importing the Data 208

Exploring and Preparing the Data 208

Training the Model 212

Evaluating the Model 215

Exercises 216

Part III: Classification 221

Chapter 6 k-Nearest Neighbors 223

Detecting Heart Disease 224

k-Nearest Neighbors 226

Finding the Nearest Neighbors 228

Labeling Unlabeled Data 230

Choosing an Appropriate k 231

k-Nearest Neighbors Model 232

Dealing with Missing Data 234

Normalizing the Data 234

Dealing with Categorical Features 235

Splitting the Data 237

Classifying Unlabeled Data 237

Evaluating the Model 238

Improving the Model 239

Strengths and Weaknesses 241

Case Study: Revisiting the Donor Dataset 241

Importing the Data 241

Exploring and Preparing the Data 242

Dealing with Missing Data 243

Normalizing the Data 245

Splitting and Balancing the Data 246

Building the Model 248

Evaluating the Model 248

Exercises 249

Chapter 7 Naïve Bayes 251

Classifying Spam Email 252

Naïve Bayes 253

Probability 254

Joint Probability 255

Conditional Probability 256

Classification with Naïve Bayes 257

Additive Smoothing 261

Naïve Bayes Model 263

Splitting the Data 266

Training a Model 267

Evaluating the Model 267

Strengths and Weaknesses of the Naïve Bayes Classifier 269

Case Study: Revisiting the Heart Disease Detection Problem 269

Importing the Data 270

Exploring and Preparing the Data 270

Building the Model 272

Evaluating the Model 273

Exercises 274

Chapter 8 Decision Trees 277

Predicting Build Permit Decisions 278

Decision Trees 279

Recursive Partitioning 281

Entropy 285

Information Gain 286

Gini Impurity 290

Pruning 290

Building a Classification Tree Model 291

Splitting the Data 294

Training a Model 295

Evaluating the Model 295

Strengths and Weaknesses of the Decision Tree Model 298

Case Study: Revisiting the Income Prediction Problem 299

Importing the Data 300

Exploring and Preparing the Data 300

Building the Model 302

Evaluating the Model 302

Exercises 304

Part IV: Evaluating and Improving Performance 305

Chapter 9 Evaluating Performance 307

Estimating Future Performance 308

Cross-Validation 311

k-Fold Cross-Validation 311

Leave-One-Out Cross-Validation 315

Random Cross-Validation 316

Bootstrap Sampling 318

Beyond Predictive Accuracy 321

Kappa 323

Precision and Recall 326

Sensitivity and Specificity 328

Visualizing Model Performance 332

Receiver Operating Characteristic Curve 333

Area Under the Curve 336

Exercises 339

Chapter 10 Improving Performance 341

Parameter Tuning 342

Automated Parameter Tuning 342

Customized Parameter Tuning 348

Ensemble Methods 354

Bagging 355

Boosting 358

Stacking 361

Exercises 366

Part V: Unsupervised Learning 367

Chapter 11 Discovering Patterns with Association Rules 369

Market Basket Analysis 370

Association Rules 371

Identifying Strong Rules 373

Support 373

Confi dence 373

Lift 374

The Apriori Algorithm 374

Discovering Association Rules 376

Generating the Rules 377

Evaluating the Rules 382

Strengths and Weaknesses 386

Case Study: Identifying Grocery Purchase Patterns 386

Importing the Data 387

Exploring and Preparing the Data 387

Generating the Rules 389

Evaluating the Rules 389

Exercises 392

Notes 393

Chapter 12 Grouping Data with Clustering 395

Clustering 396

k-Means Clustering 399

Segmenting Colleges with k-Means Clustering 403

Creating the Clusters 404

Analyzing the Clusters 407

Choosing the Right Number of Clusters 409

The Elbow Method 409

The Average Silhouette Method 411

The Gap Statistic 412

Strengths and Weaknesses of k-Means Clustering 414

Case Study: Segmenting Shopping Mall Customers 415

Exploring and Preparing the Data 415

Clustering the Data 416

Evaluating the Clusters 418

Exercises 420

Notes 420

Index 421

Practical Machine Learning in R

    Product form

    £24.79

    Includes FREE delivery

    RRP £30.99 – you save £6.20 (20%)

    Order before 4pm tomorrow for delivery by Sat 20 Jun 2026.

    A Paperback / softback by Fred Nwanganga, Mike Chapple

    15 in stock


      View other formats and editions of Practical Machine Learning in R by Fred Nwanganga

      Publisher: John Wiley & Sons Inc
      Publication Date: 06/07/2020
      ISBN13: 9781119591511, 978-1119591511
      ISBN10: 1119591511

      Description

      Book Synopsis


      Table of Contents

      About the Authors vii

      About the Technical Editors ix

      Acknowledgments xi

      Introduction xxi

      Part I: Getting Started 1

      Chapter 1 What is Machine Learning? 3

      Discovering Knowledge in Data 5

      Introducing Algorithms 5

      Artificial Intelligence, Machine Learning, and Deep Learning 6

      Machine Learning Techniques 7

      Supervised Learning 8

      Unsupervised Learning 12

      Model Selection 14

      Classification Techniques 14

      Regression Techniques 15

      Similarity Learning Techniques 16

      Model Evaluation 16

      Classification Errors 17

      Regression Errors 19

      Types of Error 20

      Partitioning Datasets 22

      Holdout Method 23

      Cross-Validation Methods 23

      Exercises 24

      Chapter 2 Introduction to R and RStudio 25

      Welcome to R 26

      R and RStudio Components 27

      The R Language 27

      RStudio 28

      RStudio Desktop 28

      RStudio Server 29

      Exploring the RStudio

      Environment 29

      R Packages 38

      The CRAN Repository 38

      Installing Packages 38

      Loading Packages 39

      Package Documentation 40

      Writing and Running an R Script 41

      Data Types in R 44

      Vectors 45

      Testing Data Types 47

      Converting Data Types 50

      Missing Values 51

      Exercises 52

      Chapter 3 Managing Data 53

      The Tidyverse 54

      Data Collection 55

      Key Considerations 55

      Collecting Ground Truth Data 55

      Data Relevance 55

      Quantity of Data 56

      Ethics 56

      Importing the Data 56

      Reading Comma-Delimited Files 56

      Reading Other Delimited Files 60

      Data Exploration 60

      Describing the Data 61

      Instance 61

      Feature 61

      Dimensionality 62

      Sparsity and Density 62

      Resolution 62

      Descriptive Statistics 63

      Visualizing the Data 69

      Comparison 69

      Relationship 70

      Distribution 72

      Composition 73

      Data Preparation 74

      Cleaning the Data 75

      Missing Values 75

      Noise 79

      Outliers 81

      Class Imbalance 82

      Transforming the Data 84

      Normalization 84

      Discretization 89

      Dummy Coding 89

      Reducing the Data 92

      Sampling 92

      Dimensionality Reduction 99

      Exercises 100

      Part II: Regression 101

      Chapter 4 Linear Regression 103

      Bicycle Rentals and Regression 104

      Relationships Between Variables 106

      Correlation 106

      Regression 114

      Simple Linear Regression 115

      Ordinary Least Squares Method 116

      Simple Linear Regression Model 119

      Evaluating the Model 120

      Residuals 121

      Coefficients 121

      Diagnostics 122

      Multiple Linear Regression 124

      The Multiple Linear Regression Model 124

      Evaluating the Model 125

      Residual Diagnostics 127

      Influential Point Analysis 130

      Multicollinearity 133

      Improving the Model 135

      Considering Nonlinear Relationships 135

      Considering Categorical Variables 137

      Considering Interactions Between Variables 139

      Selecting the Important Variables 141

      Strengths and Weaknesses 146

      Case Study: Predicting Blood Pressure 147

      Importing the Data 148

      Exploring the Data 149

      Fitting the Simple Linear Regression Model 151

      Fitting the Multiple Linear Regression Model 152

      Exercises 161

      Chapter 5 Logistic Regression 165

      Prospecting for Potential Donors 166

      Classifi cation 169

      Logistic Regression 170

      Odds Ratio 172

      Binomial Logistic Regression Model 176

      Dealing with Missing Data 178

      Dealing with Outliers 182

      Splitting the Data 187

      Dealing with Class Imbalance 188

      Training a Model 190

      Evaluating the Model 190

      Coeffi cients 193

      Diagnostics 195

      Predictive Accuracy 195

      Improving the Model 198

      Dealing with Multicollinearity 198

      Choosing a Cutoff Value 205

      Strengths and Weaknesses 206

      Case Study: Income Prediction 207

      Importing the Data 208

      Exploring and Preparing the Data 208

      Training the Model 212

      Evaluating the Model 215

      Exercises 216

      Part III: Classification 221

      Chapter 6 k-Nearest Neighbors 223

      Detecting Heart Disease 224

      k-Nearest Neighbors 226

      Finding the Nearest Neighbors 228

      Labeling Unlabeled Data 230

      Choosing an Appropriate k 231

      k-Nearest Neighbors Model 232

      Dealing with Missing Data 234

      Normalizing the Data 234

      Dealing with Categorical Features 235

      Splitting the Data 237

      Classifying Unlabeled Data 237

      Evaluating the Model 238

      Improving the Model 239

      Strengths and Weaknesses 241

      Case Study: Revisiting the Donor Dataset 241

      Importing the Data 241

      Exploring and Preparing the Data 242

      Dealing with Missing Data 243

      Normalizing the Data 245

      Splitting and Balancing the Data 246

      Building the Model 248

      Evaluating the Model 248

      Exercises 249

      Chapter 7 Naïve Bayes 251

      Classifying Spam Email 252

      Naïve Bayes 253

      Probability 254

      Joint Probability 255

      Conditional Probability 256

      Classification with Naïve Bayes 257

      Additive Smoothing 261

      Naïve Bayes Model 263

      Splitting the Data 266

      Training a Model 267

      Evaluating the Model 267

      Strengths and Weaknesses of the Naïve Bayes Classifier 269

      Case Study: Revisiting the Heart Disease Detection Problem 269

      Importing the Data 270

      Exploring and Preparing the Data 270

      Building the Model 272

      Evaluating the Model 273

      Exercises 274

      Chapter 8 Decision Trees 277

      Predicting Build Permit Decisions 278

      Decision Trees 279

      Recursive Partitioning 281

      Entropy 285

      Information Gain 286

      Gini Impurity 290

      Pruning 290

      Building a Classification Tree Model 291

      Splitting the Data 294

      Training a Model 295

      Evaluating the Model 295

      Strengths and Weaknesses of the Decision Tree Model 298

      Case Study: Revisiting the Income Prediction Problem 299

      Importing the Data 300

      Exploring and Preparing the Data 300

      Building the Model 302

      Evaluating the Model 302

      Exercises 304

      Part IV: Evaluating and Improving Performance 305

      Chapter 9 Evaluating Performance 307

      Estimating Future Performance 308

      Cross-Validation 311

      k-Fold Cross-Validation 311

      Leave-One-Out Cross-Validation 315

      Random Cross-Validation 316

      Bootstrap Sampling 318

      Beyond Predictive Accuracy 321

      Kappa 323

      Precision and Recall 326

      Sensitivity and Specificity 328

      Visualizing Model Performance 332

      Receiver Operating Characteristic Curve 333

      Area Under the Curve 336

      Exercises 339

      Chapter 10 Improving Performance 341

      Parameter Tuning 342

      Automated Parameter Tuning 342

      Customized Parameter Tuning 348

      Ensemble Methods 354

      Bagging 355

      Boosting 358

      Stacking 361

      Exercises 366

      Part V: Unsupervised Learning 367

      Chapter 11 Discovering Patterns with Association Rules 369

      Market Basket Analysis 370

      Association Rules 371

      Identifying Strong Rules 373

      Support 373

      Confi dence 373

      Lift 374

      The Apriori Algorithm 374

      Discovering Association Rules 376

      Generating the Rules 377

      Evaluating the Rules 382

      Strengths and Weaknesses 386

      Case Study: Identifying Grocery Purchase Patterns 386

      Importing the Data 387

      Exploring and Preparing the Data 387

      Generating the Rules 389

      Evaluating the Rules 389

      Exercises 392

      Notes 393

      Chapter 12 Grouping Data with Clustering 395

      Clustering 396

      k-Means Clustering 399

      Segmenting Colleges with k-Means Clustering 403

      Creating the Clusters 404

      Analyzing the Clusters 407

      Choosing the Right Number of Clusters 409

      The Elbow Method 409

      The Average Silhouette Method 411

      The Gap Statistic 412

      Strengths and Weaknesses of k-Means Clustering 414

      Case Study: Segmenting Shopping Mall Customers 415

      Exploring and Preparing the Data 415

      Clustering the Data 416

      Evaluating the Clusters 418

      Exercises 420

      Notes 420

      Index 421

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account