Description

Book Synopsis
Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets.

Trade Review

"I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (Journal of the American Statistical Association, 1 January 2014)



Table of Contents

Preface ix

Acknowledgments xi

1. Introduction 1

Reference 6

2. Processing the Information and Getting to Know Your Data 7

2.1 Example 1: 2006 Birth Data 7

2.2 Example 2: Alumni Donations 17

2.3 Example 3: Orange Juice 31

References 39

3. Standard Linear Regression 40

3.1 Estimation in R 43

3.2 Example 1: Fuel Efficiency of Automobiles 43

3.3 Example 2: Toyota Used-Car Prices 47

Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53

References 54

4. Local Polynomial Regression: a Nonparametric Regression Approach 55

4.1 Model Selection 56

4.2 Application to Density Estimation and the Smoothing of Histograms 58

4.3 Extension to the Multiple Regression Model 58

4.4 Examples and Software 58

References 65

5. Importance of Parsimony in Statistical Modeling 67

5.1 How Do We Guard Against False Discovery 67

References 70

6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71

6.1 Example 1: Prostate Cancer 74

6.2 Example 2: Orange Juice 78

References 82

7. Logistic Regression 83

7.1 Building a Linear Model for Binary Response Data 83

7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85

7.3 Statistical Inference 85

7.4 Classification of New Cases 86

7.5 Estimation in R 87

7.6 Example 1: Death Penalty Data 87

7.7 Example 2: Delayed Airplanes 92

7.8 Example 3: Loan Acceptance 100

7.9 Example 4: German Credit Data 103

References 107

8. Binary Classification, Probabilities, and Evaluating Classification Performance 108

8.1 Binary Classification 108

8.2 Using Probabilities to Make Decisions 108

8.3 Sensitivity and Specificity 109

8.4 Example: German Credit Data 109

9. Classification Using a Nearest Neighbor Analysis 115

9.1 The k-Nearest Neighbor Algorithm 116

9.2 Example 1: Forensic Glass 117

9.3 Example 2: German Credit Data 122

Reference 125

10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical

Predictor Variables 126

10.1 Example: Delayed Airplanes 127

Reference 131

11. Multinomial Logistic Regression 132

11.1 Computer Software 134

11.2 Example 1: Forensic Glass 134

11.3 Example 2: Forensic Glass Revisited 141

Appendix 11.A Specification of a Simple Triplet Matrix 147

References 149

12. More on Classification and a Discussion on Discriminant Analysis 150

12.1 Fisher’s Linear Discriminant Function 153

12.2 Example 1: German Credit Data 154

12.3 Example 2: Fisher Iris Data 156

12.4 Example 3: Forensic Glass Data 157

12.5 Example 4: MBA Admission Data 159

Reference 160

13. Decision Trees 161

13.1 Example 1: Prostate Cancer 167

13.2 Example 2: Motorcycle Acceleration 179

13.3 Example 3: Fisher Iris Data Revisited 182

14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185

14.1 R Packages for Tree Construction 185

14.2 Chi-Square Automatic Interaction Detection (CHAID) 186

14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188

14.4 Support Vector Machines (SVM) 192

14.5 Neural Networks 192

14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193

References 195

15. Clustering 196

15.1 k-Means Clustering 196

15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204

15.3 Hierarchical Clustering Procedures 212

References 219

16. Market Basket Analysis: Association Rules and Lift 220

16.1 Example 1: Online Radio 222

16.2 Example 2: Predicting Income 227

References 234

17. Dimension Reduction: Factor Models and Principal Components 235

17.1 Example 1: European Protein Consumption 238

17.2 Example 2: Monthly US Unemployment Rates 243

18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247

18.1 Three Examples 249

References 257

19. Text as Data: Text Mining and Sentiment Analysis 258

19.1 Inverse Multinomial Logistic Regression 259

19.2 Example 1: Restaurant Reviews 261

19.3 Example 2: Political Sentiment 266

Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268

References 271

20. Network Data 272

20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274

20.2 Example 2: Connections in a Friendship Network 278

References 292

Appendix A: Exercises 293

Exercise 1 294

Exercise 2 294

Exercise 3 296

Exercise 4 298

Exercise 5 299

Exercise 6 300

Exercise 7 301

Appendix B: References 338

Index 341

Data Mining and Business Analytics with R

    Product form

    £98.06

    Includes FREE delivery

    RRP £108.95 – you save £10.89 (9%)

    Order before 4pm today for delivery by Fri 3 Jul 2026.

    A Hardback by Johannes Ledolter

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Data Mining and Business Analytics with R by Johannes Ledolter

      Publisher: John Wiley & Sons Inc
      Publication Date: 28/06/2013
      ISBN13: 9781118447147, 978-1118447147
      ISBN10: 111844714X

      Description

      Book Synopsis
      Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets.

      Trade Review

      "I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (Journal of the American Statistical Association, 1 January 2014)



      Table of Contents

      Preface ix

      Acknowledgments xi

      1. Introduction 1

      Reference 6

      2. Processing the Information and Getting to Know Your Data 7

      2.1 Example 1: 2006 Birth Data 7

      2.2 Example 2: Alumni Donations 17

      2.3 Example 3: Orange Juice 31

      References 39

      3. Standard Linear Regression 40

      3.1 Estimation in R 43

      3.2 Example 1: Fuel Efficiency of Automobiles 43

      3.3 Example 2: Toyota Used-Car Prices 47

      Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53

      References 54

      4. Local Polynomial Regression: a Nonparametric Regression Approach 55

      4.1 Model Selection 56

      4.2 Application to Density Estimation and the Smoothing of Histograms 58

      4.3 Extension to the Multiple Regression Model 58

      4.4 Examples and Software 58

      References 65

      5. Importance of Parsimony in Statistical Modeling 67

      5.1 How Do We Guard Against False Discovery 67

      References 70

      6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71

      6.1 Example 1: Prostate Cancer 74

      6.2 Example 2: Orange Juice 78

      References 82

      7. Logistic Regression 83

      7.1 Building a Linear Model for Binary Response Data 83

      7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85

      7.3 Statistical Inference 85

      7.4 Classification of New Cases 86

      7.5 Estimation in R 87

      7.6 Example 1: Death Penalty Data 87

      7.7 Example 2: Delayed Airplanes 92

      7.8 Example 3: Loan Acceptance 100

      7.9 Example 4: German Credit Data 103

      References 107

      8. Binary Classification, Probabilities, and Evaluating Classification Performance 108

      8.1 Binary Classification 108

      8.2 Using Probabilities to Make Decisions 108

      8.3 Sensitivity and Specificity 109

      8.4 Example: German Credit Data 109

      9. Classification Using a Nearest Neighbor Analysis 115

      9.1 The k-Nearest Neighbor Algorithm 116

      9.2 Example 1: Forensic Glass 117

      9.3 Example 2: German Credit Data 122

      Reference 125

      10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical

      Predictor Variables 126

      10.1 Example: Delayed Airplanes 127

      Reference 131

      11. Multinomial Logistic Regression 132

      11.1 Computer Software 134

      11.2 Example 1: Forensic Glass 134

      11.3 Example 2: Forensic Glass Revisited 141

      Appendix 11.A Specification of a Simple Triplet Matrix 147

      References 149

      12. More on Classification and a Discussion on Discriminant Analysis 150

      12.1 Fisher’s Linear Discriminant Function 153

      12.2 Example 1: German Credit Data 154

      12.3 Example 2: Fisher Iris Data 156

      12.4 Example 3: Forensic Glass Data 157

      12.5 Example 4: MBA Admission Data 159

      Reference 160

      13. Decision Trees 161

      13.1 Example 1: Prostate Cancer 167

      13.2 Example 2: Motorcycle Acceleration 179

      13.3 Example 3: Fisher Iris Data Revisited 182

      14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185

      14.1 R Packages for Tree Construction 185

      14.2 Chi-Square Automatic Interaction Detection (CHAID) 186

      14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188

      14.4 Support Vector Machines (SVM) 192

      14.5 Neural Networks 192

      14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193

      References 195

      15. Clustering 196

      15.1 k-Means Clustering 196

      15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204

      15.3 Hierarchical Clustering Procedures 212

      References 219

      16. Market Basket Analysis: Association Rules and Lift 220

      16.1 Example 1: Online Radio 222

      16.2 Example 2: Predicting Income 227

      References 234

      17. Dimension Reduction: Factor Models and Principal Components 235

      17.1 Example 1: European Protein Consumption 238

      17.2 Example 2: Monthly US Unemployment Rates 243

      18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247

      18.1 Three Examples 249

      References 257

      19. Text as Data: Text Mining and Sentiment Analysis 258

      19.1 Inverse Multinomial Logistic Regression 259

      19.2 Example 1: Restaurant Reviews 261

      19.3 Example 2: Political Sentiment 266

      Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268

      References 271

      20. Network Data 272

      20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274

      20.2 Example 2: Connections in a Friendship Network 278

      References 292

      Appendix A: Exercises 293

      Exercise 1 294

      Exercise 2 294

      Exercise 3 296

      Exercise 4 298

      Exercise 5 299

      Exercise 6 300

      Exercise 7 301

      Appendix B: References 338

      Index 341

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account