Description

Book Synopsis

Introduces professionals and scientists to statistics and machine learning using the programming language R

Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.

The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuse

Table of Contents

Foreword xxv

About the Author xxvii

Acknowledgements xxix

Preface xxxi

About the Companion Site xxxv

I Introduction 1

1 The Big Picture with Kondratiev and Kardashev 3

2 The Scientific Method and Data 7

3 Conventions 11

II Starting with R and Elements of Statistics 19

4 The Basics of R 21

4.1 Getting Started with R 23

4.2 Variables 26

4.3 Data Types 28

4.3.1 The Elementary Types 28

4.3.2 Vectors 29

4.3.3 Accessing Data from a Vector 29

4.3.4 Matrices 32

4.3.5 Arrays 38

4.3.6 Lists 41

4.3.7 Factors 45

4.3.8 Data Frames 49

4.3.9 Strings or the Character-type 54

4.4 Operators 57

4.4.1 Arithmetic Operators 57

4.4.2 Relational Operators 57

4.4.3 Logical Operators 58

4.4.4 Assignment Operators 59

4.4.5 Other Operators 61

4.5 Flow Control Statements 63

4.5.1 Choices 63

4.5.2 Loops 65

4.6 Functions 69

4.6.1 Built-in Functions 69

4.6.2 Help with Functions 69

4.6.3 User-defined Functions 70

4.6.4 Changing Functions 70

4.6.5 Creating Function with Default Arguments 71

4.7 Packages 72

4.7.1 Discovering Packages in R 72

4.7.2 Managing Packages in R 73

4.8 Selected Data Interfaces 75

4.8.1 CSV Files 75

4.8.2 Excel Files 79

4.8.3 Databases 79

5 Lexical Scoping and Environments 81

5.1 Environments in R 81

5.2 Lexical Scoping in R 83

6 The Implementation of OO 87

6.1 Base Types 89

6.2 S3 Objects 91

6.2.1 Creating S3 Objects 94

6.2.2 Creating Generic Methods 96

6.2.3 Method Dispatch 97

6.2.4 Group Generic Functions 98

6.3 S4 Objects 100

6.3.1 Creating S4 Objects 100

6.3.2 Using S4 Objects 101

6.3.3 Validation of Input 105

6.3.4 Constructor functions 107

6.3.5 The Data slot 108

6.3.6 Recognising Objects, Generic Functions, and Methods 108

6.3.7 CreatingS4Generics 110

6.3.8 Method Dispatch 111

6.4 The Reference Class, refclass, RC or R5 Model 113

6.4.1 Creating RC Objects 113

6.4.2 Important Methods and Attributes 117

6.5 Conclusions about the OO Implementation 119

7 Tidy R with the Tidyverse 121

7.1 The Philosophy of the Tidyverse 121

7.2 Packages in the Tidyverse 124

7.2.1 The Core Tidyverse 124

7.2.2 The Non-core Tidyverse 125

7.3 Working with the Tidyverse 127

7.3.1 Tibbles 127

7.3.2 Piping with R 132

7.3.3 Attention Points When Using the Pipe 133

7.3.4 Advanced Piping 134

7.3.5 Conclusion 137

8 Elements of Descriptive Statistics 139

8.1 Measures of Central Tendency 139

8.1.1 Mean 139

8.1.2 The Median 142

8.1.3 The Mode 143

8.2 Measures of Variation or Spread 145

8.3 Measures of Covariation 147

8.3.1 The Pearson Correlation 147

8.3.2 The Spearman Correlation 148

8.3.3 Chi-square Tests 149

8.4 Distributions 150

8.4.1 Normal Distribution 150

8.4.2 Binomial Distribution 153

8.5 Creating an Overview of Data Characteristics 155

9 Visualisation Methods 159

9.1 Scatterplots 161

9.2 Line Graphs 163

9.3 Pie Charts 165

9.4 Bar Charts 167

9.5 Boxplots 171

9.6 Violin Plots 173

9.7 Histograms 176

9.8 Plotting Functions 179

9.9 Maps and Contour Plots 180

9.10 Heat-maps 181

9.11 Text Mining 184

9.11.1 Word Clouds 184

9.11.2 Word Associations 188

9.12 Colours in R 191

10 Time Series Analysis 197

10.1 Time Series in R 197

10.1.1 The Basics of Time Series in R 197

10.2 Forecasting 200

10.2.1 Moving Average 200

10.2.2 Seasonal Decomposition 206

11 Further Reading 211

III Data Import 213

12 A Short History of Modern Database Systems 215

13 RDBMS 219

14 SQL 223

14.1 Designing the Database 223

14.2 Building the Database Structure 226

14.2.1 Installing a RDBMS 226

14.2.2 Creating the Database 228

14.2.3 Creating the Tables and Relations 229

14.3 Adding Data to the Database 235

14.4 Querying the Database 239

14.4.1 The Basic Select Query 239

14.4.2 More Complex Queries 240

14.5 Modifying the Database Structure 244

14.6 Selected Features of SQL 249

14.6.1 Changing Data 249

14.6.2 Functions in SQL 249

15 Connecting R to an SQL Database 253

IV Data Wrangling 257

16 Anonymous Data 261

17 Data Wrangling in the tidyverse 265

17.1 Importing the Data 266

17.1.1 Importing from an SQLRDBMS 266

17.1.2 Importing Flat Files in the Tidyverse 267

17.2 Tidy Data 275

17.3 Tidying Up Data with tidyr 277

17.3.1 Splitting Tables 278

17.3.2 Convert Headers to Data 281

17.3.3 Spreading One Column Over Many 284

17.3.4 Split One Columns into Many 285

17.3.5 Merge Multiple Columns Into One 286

17.3.6 Wrong Data 287

17.4 SQL-like Functionality via dplyr 288

17.4.1 Selecting Columns 288

17.4.2 Filtering Rows 289

17.4.3 Joining 290

17.4.4 Mutating Data 293

17.4.5 Set Operations 296

17.5 String Manipulation in the tidyverse 299

17.5.1 Basic String Manipulation 300

17.5.2 Pattern Matching with Regular Expressions 302

17.6 Dates with lubridate 314

17.6.1 ISO 8601 Format 315

17.6.2 Time-zones 317

17.6.3 Extract Date and Time Components 318

17.6.4 Calculating with Date-times 319

17.7 Factors with Forcats 325

18 Dealing with Missing Data 333

18.1 Reasons for Data to be Missing 334

18.2 Methods to Handle Missing Data 336

18.2.1 Alternative Solutions to Missing Data 336

18.2.2 Predictive Mean Matching(PMM) 338

18.3 R Packages to Deal with Missing Data 339

18.3.1 mice 339

18.3.2 missForest 340

18.3.3 Hmisc 341

19 Data Binning 343

19.1 What is Binning and Why Use It 343

19.2 Tuning the Binning Procedure 347

19.3 More Complex Cases: Matrix Binning 352

19.4 Weight of Evidence and Information Value 359

19.4.1 Weight of Evidence(WOE) 359

19.4.2 Information Value(IV) 359

19.4.3 WOE and IV in R 359

20 Factoring Analysis and Principle Components 363

20.1 Principle Components Analysis (PCA) 364

20.2 Factor Analysis 368

V Modelling 373

21 Regression Models 375

21.1 Linear Regression 375

21.2 Multiple Linear Regression 379

21.2.1 Poisson Regression 379

21.2.2 Non-linear Regression 381

21.3 Performance of Regression Models 384

21.3.1 Mean Square Error (MSE) 384

21.3.2 R-Squared 384

21.3.3 Mean Average Deviation(MAD) 386

22 Classification Models 387

22.1 Logistic Regression 388

22.2 Performance of Binary Classification Models 390

22.2.1 The Confusion Matrix and Related Measures 391

22.2.2 ROC 393

22.2.3 The AUC 396

22.2.4 The Gini Coefficient 397

22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression 398

22.2.6 Finding an Optimal Cut-off 399

23 Learning Machines 405

23.1 Decision Tree 407

23.1.1 Essential Background 407

23.1.2 Important Considerations 412

23.1.3 Growing Trees with the Package rpart 414

23.1.4 Evaluating the Performance of a Decision Tree 424

23.2 Random Forest 428

23.3 Artificial Neural Networks (ANNs) 434

23.3.1 The Basics of ANNs in R 434

23.3.2 Neural Networks in R 436

23.3.3 The Work-flow to for Fitting a NN 438

23.3.4 Cross Validate the NN 444

23.4 Support Vector Machine 447

23.4.1 Fitting a SVM in R 447

23.4.2 Optimizing the SVM 449

23.5 Unsupervised Learning and Clustering 450

23.5.1 k-Means Clustering 450

23.5.2 Visualizing Clusters in Three Dimensions 462

23.5.3 Fuzzy Clustering 464

23.5.4 Hierarchical Clustering 466

23.5.5 Other Clustering Methods 468

24 Towards a Tidy Modelling Cycle with modelr 469

24.1 Adding Predictions 470

24.2 Adding Residuals 471

24.3 Bootstrapping Data 472

24.4 Other Functions of modelr 474

25 Model Validation 475

25.1 Model Quality Measures 476

25.2 Predictions and Residuals 477

25.3 Bootstrapping 479

25.3.1 Bootstrapping in Base R 479

25.3.2 Bootstrapping in the tidyverse with modelr 481

25.4 Cross-Validation 483

25.4.1 Elementary Cross Validation 483

25.4.2 Monte Carlo Cross Validation 486

25.4.3 k-Fold Cross Validation 488

25.4.4 Comparing Cross Validation Methods 489

25.5 Validation in a Broader Perspective 492

26 Labs 495

26.1 Financial Analysis with quantmod 495

26.1.1 The Basics of quantmod 495

26.1.2 Types of Data Available in quantmod 496

26.1.3 Plotting with quantmod 497

26.1.4 The quantmod Data Structure 500

26.1.5 Support Functions Supplied by quantmod 502

26.1.6 Financial Modelling in quantmod 504

27 Multi Criteria Decision Analysis (MCDA) 511

27.1 What and Why 511

27.2 General Work-flow 513

27.3 Identify the Issue at Hand: Steps 1 and 2 516

27.4 Step3: the Decision Matrix 518

27.4.1 Construct a Decision Matrix 518

27.4.2 Normalize the Decision Matrix 520

27.5 Step 4: Delete Inefficient and Unacceptable Alternatives 521

27.5.1 Unacceptable Alternatives 521

27.5.2 Dominance – Inefficient Alternatives 521

27.6 Plotting Preference Relationships 524

27.7 Step5: MCDA Methods 526

27.7.1 Examples of Non-compensatory Methods 526

27.7.2 The Weighted Sum Method(WSM) 527

27.7.3 Weighted Product Method(WPM) 530

27.7.4 ELECTRE 530

27.7.5 PROMethEE 540

27.7.6 PCA(Gaia) 553

27.7.7 Outranking Methods 557

27.7.8 Goal Programming 558

27.8 Summary MCDA 561

VI Introduction to Companies 563

28 Financial Accounting (FA) 567

28.1 The Statements of Accounts 568

28.1.1 Income Statement 568

28.1.2 Net Income: The P&L statement 568

28.1.3 Balance Sheet 569

28.2 The Value Chain 571

28.3 Further, Terminology 573

28.4 Selected Financial Ratios 575

29 Management Accounting 583

29.1 Introduction 583

29.1.1 Definition of Management Accounting (MA) 583

29.1.2 Management Information Systems (MIS) 584

29.2 Selected Methods in MA 585

29.2.1 Cost Accounting 585

29.2.2 Selected Cost Types 587

29.3 Selected Use Cases of MA 590

29.3.1 Balanced Scorecard 590

29.3.2 Key Performance Indicators (KPIs) 591

30 Asset Valuation Basics 597

30.1 Time Value of Money 598

30.1.1 Interest Basics 598

30.1.2 Specific Interest Rate Concepts 598

30.1.3 Discounting 600

30.2 Cash 601

30.3 Bonds 602

30.3.1 Features of a Bond 602

30.3.2 Valuation of Bonds 604

30.3.3 Duration 606

30.4 The Capital Asset Pricing Model (CAPM) 610

30.4.1 The CAPM Framework 610

30.4.2 The CAPM and Risk 612

30.4.3 Limitations and Shortcomings of the CAPM 612

30.5 Equities 614

30.5.1 Definition 614

30.5.2 Short History 614

30.5.3 Valuation of Equities 615

30.5.4 Absolute Value Models 616

30.5.5 Relative Value Models 625

30.5.6 Selection of Valuation Methods 630

30.5.7 Pitfalls in Company Valuation 631

30.6 Forwards and Futures 638

30.7 Options 640

30.7.1 Definitions 640

30.7.2 Commercial Aspects 642

30.7.3 Short History 643

30.7.4 Valuation of Options at Maturity 644

30.7.5 The Black and Scholes Model 649

30.7.6 The Binomial Model 654

30.7.7 Dependencies of the Option Price 660

30.7.8 The Greeks 664

30.7.9 Delta Hedging 665

30.7.10 Linear Option Strategies 667

30.7.11 Integrated Option Strategies 674

30.7.12 Exotic Options 678

30.7.13 Capital Protected Structures 680

VII Reporting 683

31 A Grammar of Graphics with ggplot2 687

31.1 TheBasicsofggplot2 688

31.2 Over-plotting 692

31.3 CaseStudyforggplot2 696

32 R Markdown 699

33 knitr and LATEX 703

34 An Automated Development Cycle 707

35 Writing and Communication Skills 709

36 Interactive Apps 713

36.1 Shiny 715

36.2 Browser Born Data Visualization 719

36.2.1 HTML-widgets 719

36.2.2 Interactive Maps with leaflet 720

36.2.3 Interactive Data Visualisation with ggvis 721

36.2.4 googleVis 723

36.3 Dashboards 725

36.3.1 The Business Case: a Diversity Dashboard 726

36.3.2 A Dashboard with flexdashboard 731

36.3.3 A Dashboard with shinydashboard 737

VIII Bigger and Faster R 741

37 Parallel Computing 743

37.1 Combine foreach and doParallel 745

37.2 Distribute Calculations over LAN with Snow 748

37.3 Using the GPU 752

37.3.1 Getting Started with gpuR 754

37.3.2 On the Importance of Memory use 757

37.3.3 Conclusions for GPU Programming 759

38 R and Big Data 761

38.1 Use a Powerful Server 763

38.1.1 Use R on a Server 763

38.1.2 Let the Database Server do the Heavy Lifting 763

38.2 Using more Memory than we have RAM 765

39 Parallelism for Big Data 767

39.1 Apache Hadoop 769

39.2 Apache Spark 771

39.2.1 Installing Spark 771

39.2.2 Running Spark 773

39.2.3 SparkR 776

39.2.4 sparklyr 788

39.2.5 SparkR or sparklyr 791

40 The Need for Speed 793

40.1 Benchmarking 794

40.2 Optimize Code 797

40.2.1 Avoid Repeating the Same 797

40.2.2 Use Vectorisation where Appropriate 797

40.2.3 Pre-allocating Memory 799

40.2.4 Use the Fastest Function 800

40.2.5 Use the Fastest Package 801

40.2.6 Be Mindful about Details 802

40.2.7 Compile Functions 804

40.2.8 Use C or C++ Code in R 806

40.2.9 Using a C++ Source File in R 809

40.2.10CallCompiledC++Functions in R 811

40.3 Profiling Code 812

40.3.1 The Package profr 813

40.3.2 The Package proftools 813

40.4 Optimize Your Computer 817

IX Appendices 819

A Create your own R Package 821

A.1 Creating the Package in the R Console 823

A.2 Update the Package Description 825

A.3 Documenting the Functionsxs 826

A.4 Loading the Package 827

A.5 Further Steps 828

B Levels of Measurement 829

B.1 Nominal Scale 829

B.2 Ordinal Scale 830

B.3 Interval Scale 831

B.4 Ratio Scale 832

C Trademark Notices 833

C.1 General Trademark Notices 834

C.2 R-Related Notices 835

C.2.1 Crediting Developers of R Packages 835

C.2.2 The R-packages used in this Book 835

D Code Not Shown in the Body of the Book 839

E Answers to Selected Questions 845

Bibliography 859

Nomenclature 869

Index 881

The Big RBook

Product form

£93.56

Includes FREE delivery

RRP £103.95 – you save £10.39 (9%)

Order before 4pm tomorrow for delivery by Thu 22 Jan 2026.

A Hardback by Philippe J. S. De Brouwer

2 in stock


    View other formats and editions of The Big RBook by Philippe J. S. De Brouwer

    Publisher: John Wiley & Sons Inc
    Publication Date: 03/12/2020
    ISBN13: 9781119632726, 978-1119632726
    ISBN10: 1119632722

    Description

    Book Synopsis

    Introduces professionals and scientists to statistics and machine learning using the programming language R

    Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science.

    The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuse

    Table of Contents

    Foreword xxv

    About the Author xxvii

    Acknowledgements xxix

    Preface xxxi

    About the Companion Site xxxv

    I Introduction 1

    1 The Big Picture with Kondratiev and Kardashev 3

    2 The Scientific Method and Data 7

    3 Conventions 11

    II Starting with R and Elements of Statistics 19

    4 The Basics of R 21

    4.1 Getting Started with R 23

    4.2 Variables 26

    4.3 Data Types 28

    4.3.1 The Elementary Types 28

    4.3.2 Vectors 29

    4.3.3 Accessing Data from a Vector 29

    4.3.4 Matrices 32

    4.3.5 Arrays 38

    4.3.6 Lists 41

    4.3.7 Factors 45

    4.3.8 Data Frames 49

    4.3.9 Strings or the Character-type 54

    4.4 Operators 57

    4.4.1 Arithmetic Operators 57

    4.4.2 Relational Operators 57

    4.4.3 Logical Operators 58

    4.4.4 Assignment Operators 59

    4.4.5 Other Operators 61

    4.5 Flow Control Statements 63

    4.5.1 Choices 63

    4.5.2 Loops 65

    4.6 Functions 69

    4.6.1 Built-in Functions 69

    4.6.2 Help with Functions 69

    4.6.3 User-defined Functions 70

    4.6.4 Changing Functions 70

    4.6.5 Creating Function with Default Arguments 71

    4.7 Packages 72

    4.7.1 Discovering Packages in R 72

    4.7.2 Managing Packages in R 73

    4.8 Selected Data Interfaces 75

    4.8.1 CSV Files 75

    4.8.2 Excel Files 79

    4.8.3 Databases 79

    5 Lexical Scoping and Environments 81

    5.1 Environments in R 81

    5.2 Lexical Scoping in R 83

    6 The Implementation of OO 87

    6.1 Base Types 89

    6.2 S3 Objects 91

    6.2.1 Creating S3 Objects 94

    6.2.2 Creating Generic Methods 96

    6.2.3 Method Dispatch 97

    6.2.4 Group Generic Functions 98

    6.3 S4 Objects 100

    6.3.1 Creating S4 Objects 100

    6.3.2 Using S4 Objects 101

    6.3.3 Validation of Input 105

    6.3.4 Constructor functions 107

    6.3.5 The Data slot 108

    6.3.6 Recognising Objects, Generic Functions, and Methods 108

    6.3.7 CreatingS4Generics 110

    6.3.8 Method Dispatch 111

    6.4 The Reference Class, refclass, RC or R5 Model 113

    6.4.1 Creating RC Objects 113

    6.4.2 Important Methods and Attributes 117

    6.5 Conclusions about the OO Implementation 119

    7 Tidy R with the Tidyverse 121

    7.1 The Philosophy of the Tidyverse 121

    7.2 Packages in the Tidyverse 124

    7.2.1 The Core Tidyverse 124

    7.2.2 The Non-core Tidyverse 125

    7.3 Working with the Tidyverse 127

    7.3.1 Tibbles 127

    7.3.2 Piping with R 132

    7.3.3 Attention Points When Using the Pipe 133

    7.3.4 Advanced Piping 134

    7.3.5 Conclusion 137

    8 Elements of Descriptive Statistics 139

    8.1 Measures of Central Tendency 139

    8.1.1 Mean 139

    8.1.2 The Median 142

    8.1.3 The Mode 143

    8.2 Measures of Variation or Spread 145

    8.3 Measures of Covariation 147

    8.3.1 The Pearson Correlation 147

    8.3.2 The Spearman Correlation 148

    8.3.3 Chi-square Tests 149

    8.4 Distributions 150

    8.4.1 Normal Distribution 150

    8.4.2 Binomial Distribution 153

    8.5 Creating an Overview of Data Characteristics 155

    9 Visualisation Methods 159

    9.1 Scatterplots 161

    9.2 Line Graphs 163

    9.3 Pie Charts 165

    9.4 Bar Charts 167

    9.5 Boxplots 171

    9.6 Violin Plots 173

    9.7 Histograms 176

    9.8 Plotting Functions 179

    9.9 Maps and Contour Plots 180

    9.10 Heat-maps 181

    9.11 Text Mining 184

    9.11.1 Word Clouds 184

    9.11.2 Word Associations 188

    9.12 Colours in R 191

    10 Time Series Analysis 197

    10.1 Time Series in R 197

    10.1.1 The Basics of Time Series in R 197

    10.2 Forecasting 200

    10.2.1 Moving Average 200

    10.2.2 Seasonal Decomposition 206

    11 Further Reading 211

    III Data Import 213

    12 A Short History of Modern Database Systems 215

    13 RDBMS 219

    14 SQL 223

    14.1 Designing the Database 223

    14.2 Building the Database Structure 226

    14.2.1 Installing a RDBMS 226

    14.2.2 Creating the Database 228

    14.2.3 Creating the Tables and Relations 229

    14.3 Adding Data to the Database 235

    14.4 Querying the Database 239

    14.4.1 The Basic Select Query 239

    14.4.2 More Complex Queries 240

    14.5 Modifying the Database Structure 244

    14.6 Selected Features of SQL 249

    14.6.1 Changing Data 249

    14.6.2 Functions in SQL 249

    15 Connecting R to an SQL Database 253

    IV Data Wrangling 257

    16 Anonymous Data 261

    17 Data Wrangling in the tidyverse 265

    17.1 Importing the Data 266

    17.1.1 Importing from an SQLRDBMS 266

    17.1.2 Importing Flat Files in the Tidyverse 267

    17.2 Tidy Data 275

    17.3 Tidying Up Data with tidyr 277

    17.3.1 Splitting Tables 278

    17.3.2 Convert Headers to Data 281

    17.3.3 Spreading One Column Over Many 284

    17.3.4 Split One Columns into Many 285

    17.3.5 Merge Multiple Columns Into One 286

    17.3.6 Wrong Data 287

    17.4 SQL-like Functionality via dplyr 288

    17.4.1 Selecting Columns 288

    17.4.2 Filtering Rows 289

    17.4.3 Joining 290

    17.4.4 Mutating Data 293

    17.4.5 Set Operations 296

    17.5 String Manipulation in the tidyverse 299

    17.5.1 Basic String Manipulation 300

    17.5.2 Pattern Matching with Regular Expressions 302

    17.6 Dates with lubridate 314

    17.6.1 ISO 8601 Format 315

    17.6.2 Time-zones 317

    17.6.3 Extract Date and Time Components 318

    17.6.4 Calculating with Date-times 319

    17.7 Factors with Forcats 325

    18 Dealing with Missing Data 333

    18.1 Reasons for Data to be Missing 334

    18.2 Methods to Handle Missing Data 336

    18.2.1 Alternative Solutions to Missing Data 336

    18.2.2 Predictive Mean Matching(PMM) 338

    18.3 R Packages to Deal with Missing Data 339

    18.3.1 mice 339

    18.3.2 missForest 340

    18.3.3 Hmisc 341

    19 Data Binning 343

    19.1 What is Binning and Why Use It 343

    19.2 Tuning the Binning Procedure 347

    19.3 More Complex Cases: Matrix Binning 352

    19.4 Weight of Evidence and Information Value 359

    19.4.1 Weight of Evidence(WOE) 359

    19.4.2 Information Value(IV) 359

    19.4.3 WOE and IV in R 359

    20 Factoring Analysis and Principle Components 363

    20.1 Principle Components Analysis (PCA) 364

    20.2 Factor Analysis 368

    V Modelling 373

    21 Regression Models 375

    21.1 Linear Regression 375

    21.2 Multiple Linear Regression 379

    21.2.1 Poisson Regression 379

    21.2.2 Non-linear Regression 381

    21.3 Performance of Regression Models 384

    21.3.1 Mean Square Error (MSE) 384

    21.3.2 R-Squared 384

    21.3.3 Mean Average Deviation(MAD) 386

    22 Classification Models 387

    22.1 Logistic Regression 388

    22.2 Performance of Binary Classification Models 390

    22.2.1 The Confusion Matrix and Related Measures 391

    22.2.2 ROC 393

    22.2.3 The AUC 396

    22.2.4 The Gini Coefficient 397

    22.2.5 Kolmogorov-Smirnov (KS) for Logistic Regression 398

    22.2.6 Finding an Optimal Cut-off 399

    23 Learning Machines 405

    23.1 Decision Tree 407

    23.1.1 Essential Background 407

    23.1.2 Important Considerations 412

    23.1.3 Growing Trees with the Package rpart 414

    23.1.4 Evaluating the Performance of a Decision Tree 424

    23.2 Random Forest 428

    23.3 Artificial Neural Networks (ANNs) 434

    23.3.1 The Basics of ANNs in R 434

    23.3.2 Neural Networks in R 436

    23.3.3 The Work-flow to for Fitting a NN 438

    23.3.4 Cross Validate the NN 444

    23.4 Support Vector Machine 447

    23.4.1 Fitting a SVM in R 447

    23.4.2 Optimizing the SVM 449

    23.5 Unsupervised Learning and Clustering 450

    23.5.1 k-Means Clustering 450

    23.5.2 Visualizing Clusters in Three Dimensions 462

    23.5.3 Fuzzy Clustering 464

    23.5.4 Hierarchical Clustering 466

    23.5.5 Other Clustering Methods 468

    24 Towards a Tidy Modelling Cycle with modelr 469

    24.1 Adding Predictions 470

    24.2 Adding Residuals 471

    24.3 Bootstrapping Data 472

    24.4 Other Functions of modelr 474

    25 Model Validation 475

    25.1 Model Quality Measures 476

    25.2 Predictions and Residuals 477

    25.3 Bootstrapping 479

    25.3.1 Bootstrapping in Base R 479

    25.3.2 Bootstrapping in the tidyverse with modelr 481

    25.4 Cross-Validation 483

    25.4.1 Elementary Cross Validation 483

    25.4.2 Monte Carlo Cross Validation 486

    25.4.3 k-Fold Cross Validation 488

    25.4.4 Comparing Cross Validation Methods 489

    25.5 Validation in a Broader Perspective 492

    26 Labs 495

    26.1 Financial Analysis with quantmod 495

    26.1.1 The Basics of quantmod 495

    26.1.2 Types of Data Available in quantmod 496

    26.1.3 Plotting with quantmod 497

    26.1.4 The quantmod Data Structure 500

    26.1.5 Support Functions Supplied by quantmod 502

    26.1.6 Financial Modelling in quantmod 504

    27 Multi Criteria Decision Analysis (MCDA) 511

    27.1 What and Why 511

    27.2 General Work-flow 513

    27.3 Identify the Issue at Hand: Steps 1 and 2 516

    27.4 Step3: the Decision Matrix 518

    27.4.1 Construct a Decision Matrix 518

    27.4.2 Normalize the Decision Matrix 520

    27.5 Step 4: Delete Inefficient and Unacceptable Alternatives 521

    27.5.1 Unacceptable Alternatives 521

    27.5.2 Dominance – Inefficient Alternatives 521

    27.6 Plotting Preference Relationships 524

    27.7 Step5: MCDA Methods 526

    27.7.1 Examples of Non-compensatory Methods 526

    27.7.2 The Weighted Sum Method(WSM) 527

    27.7.3 Weighted Product Method(WPM) 530

    27.7.4 ELECTRE 530

    27.7.5 PROMethEE 540

    27.7.6 PCA(Gaia) 553

    27.7.7 Outranking Methods 557

    27.7.8 Goal Programming 558

    27.8 Summary MCDA 561

    VI Introduction to Companies 563

    28 Financial Accounting (FA) 567

    28.1 The Statements of Accounts 568

    28.1.1 Income Statement 568

    28.1.2 Net Income: The P&L statement 568

    28.1.3 Balance Sheet 569

    28.2 The Value Chain 571

    28.3 Further, Terminology 573

    28.4 Selected Financial Ratios 575

    29 Management Accounting 583

    29.1 Introduction 583

    29.1.1 Definition of Management Accounting (MA) 583

    29.1.2 Management Information Systems (MIS) 584

    29.2 Selected Methods in MA 585

    29.2.1 Cost Accounting 585

    29.2.2 Selected Cost Types 587

    29.3 Selected Use Cases of MA 590

    29.3.1 Balanced Scorecard 590

    29.3.2 Key Performance Indicators (KPIs) 591

    30 Asset Valuation Basics 597

    30.1 Time Value of Money 598

    30.1.1 Interest Basics 598

    30.1.2 Specific Interest Rate Concepts 598

    30.1.3 Discounting 600

    30.2 Cash 601

    30.3 Bonds 602

    30.3.1 Features of a Bond 602

    30.3.2 Valuation of Bonds 604

    30.3.3 Duration 606

    30.4 The Capital Asset Pricing Model (CAPM) 610

    30.4.1 The CAPM Framework 610

    30.4.2 The CAPM and Risk 612

    30.4.3 Limitations and Shortcomings of the CAPM 612

    30.5 Equities 614

    30.5.1 Definition 614

    30.5.2 Short History 614

    30.5.3 Valuation of Equities 615

    30.5.4 Absolute Value Models 616

    30.5.5 Relative Value Models 625

    30.5.6 Selection of Valuation Methods 630

    30.5.7 Pitfalls in Company Valuation 631

    30.6 Forwards and Futures 638

    30.7 Options 640

    30.7.1 Definitions 640

    30.7.2 Commercial Aspects 642

    30.7.3 Short History 643

    30.7.4 Valuation of Options at Maturity 644

    30.7.5 The Black and Scholes Model 649

    30.7.6 The Binomial Model 654

    30.7.7 Dependencies of the Option Price 660

    30.7.8 The Greeks 664

    30.7.9 Delta Hedging 665

    30.7.10 Linear Option Strategies 667

    30.7.11 Integrated Option Strategies 674

    30.7.12 Exotic Options 678

    30.7.13 Capital Protected Structures 680

    VII Reporting 683

    31 A Grammar of Graphics with ggplot2 687

    31.1 TheBasicsofggplot2 688

    31.2 Over-plotting 692

    31.3 CaseStudyforggplot2 696

    32 R Markdown 699

    33 knitr and LATEX 703

    34 An Automated Development Cycle 707

    35 Writing and Communication Skills 709

    36 Interactive Apps 713

    36.1 Shiny 715

    36.2 Browser Born Data Visualization 719

    36.2.1 HTML-widgets 719

    36.2.2 Interactive Maps with leaflet 720

    36.2.3 Interactive Data Visualisation with ggvis 721

    36.2.4 googleVis 723

    36.3 Dashboards 725

    36.3.1 The Business Case: a Diversity Dashboard 726

    36.3.2 A Dashboard with flexdashboard 731

    36.3.3 A Dashboard with shinydashboard 737

    VIII Bigger and Faster R 741

    37 Parallel Computing 743

    37.1 Combine foreach and doParallel 745

    37.2 Distribute Calculations over LAN with Snow 748

    37.3 Using the GPU 752

    37.3.1 Getting Started with gpuR 754

    37.3.2 On the Importance of Memory use 757

    37.3.3 Conclusions for GPU Programming 759

    38 R and Big Data 761

    38.1 Use a Powerful Server 763

    38.1.1 Use R on a Server 763

    38.1.2 Let the Database Server do the Heavy Lifting 763

    38.2 Using more Memory than we have RAM 765

    39 Parallelism for Big Data 767

    39.1 Apache Hadoop 769

    39.2 Apache Spark 771

    39.2.1 Installing Spark 771

    39.2.2 Running Spark 773

    39.2.3 SparkR 776

    39.2.4 sparklyr 788

    39.2.5 SparkR or sparklyr 791

    40 The Need for Speed 793

    40.1 Benchmarking 794

    40.2 Optimize Code 797

    40.2.1 Avoid Repeating the Same 797

    40.2.2 Use Vectorisation where Appropriate 797

    40.2.3 Pre-allocating Memory 799

    40.2.4 Use the Fastest Function 800

    40.2.5 Use the Fastest Package 801

    40.2.6 Be Mindful about Details 802

    40.2.7 Compile Functions 804

    40.2.8 Use C or C++ Code in R 806

    40.2.9 Using a C++ Source File in R 809

    40.2.10CallCompiledC++Functions in R 811

    40.3 Profiling Code 812

    40.3.1 The Package profr 813

    40.3.2 The Package proftools 813

    40.4 Optimize Your Computer 817

    IX Appendices 819

    A Create your own R Package 821

    A.1 Creating the Package in the R Console 823

    A.2 Update the Package Description 825

    A.3 Documenting the Functionsxs 826

    A.4 Loading the Package 827

    A.5 Further Steps 828

    B Levels of Measurement 829

    B.1 Nominal Scale 829

    B.2 Ordinal Scale 830

    B.3 Interval Scale 831

    B.4 Ratio Scale 832

    C Trademark Notices 833

    C.1 General Trademark Notices 834

    C.2 R-Related Notices 835

    C.2.1 Crediting Developers of R Packages 835

    C.2.2 The R-packages used in this Book 835

    D Code Not Shown in the Body of the Book 839

    E Answers to Selected Questions 845

    Bibliography 859

    Nomenclature 869

    Index 881

    Recently viewed products

    © 2026 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account