{"product_id":"machine-learning-with-spark-and-python-9781119561934","title":"Machine Learning with Spark and Python","description":"\u003cb\u003eBook Synopsis\u003c\/b\u003e\u003cbr\u003e\u003cbr\u003e\u003cbr\u003e\u003cb\u003eTable of Contents\u003c\/b\u003e\u003cbr\u003e\u003cp\u003eIntroduction xxi\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 1 The Two Essential Algorithms for Making Predictions 1\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhy are These Two Algorithms So Useful? 2\u003c\/p\u003e \u003cp\u003eWhat are Penalized Regression Methods? 7\u003c\/p\u003e \u003cp\u003eWhat are Ensemble Methods? 9\u003c\/p\u003e \u003cp\u003eHow to Decide Which Algorithm to Use 11\u003c\/p\u003e \u003cp\u003eThe Process Steps for Building a Predictive Model 13\u003c\/p\u003e \u003cp\u003eFraming a Machine Learning Problem 15\u003c\/p\u003e \u003cp\u003eFeature Extraction and Feature Engineering 17\u003c\/p\u003e \u003cp\u003eDetermining Performance of a Trained Model 18\u003c\/p\u003e \u003cp\u003eChapter Contents and Dependencies 18\u003c\/p\u003e \u003cp\u003eSummary 20\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 2 Understand the Problem by Understanding the Data 23\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eThe Anatomy of a New Problem 24\u003c\/p\u003e \u003cp\u003eDifferent Types of Attributes and Labels Drive Modeling Choices 26\u003c\/p\u003e \u003cp\u003eThings to Notice about Your New Data Set 27\u003c\/p\u003e \u003cp\u003eClassification Problems: Detecting Unexploded Mines Using Sonar 28\u003c\/p\u003e \u003cp\u003ePhysical Characteristics of the Rocks Versus Mines Data Set 29\u003c\/p\u003e \u003cp\u003eStatistical Summaries of the Rocks Versus Mines Data Set 32\u003c\/p\u003e \u003cp\u003eVisualization of Outliers Using a Quantile-Quantile Plot 34\u003c\/p\u003e \u003cp\u003eStatistical Characterization of Categorical Attributes 35\u003c\/p\u003e \u003cp\u003eHow to Use Python Pandas to Summarize the Rocks Versus Mines Data Set 36\u003c\/p\u003e \u003cp\u003eVisualizing Properties of the Rocks Versus Mines Data Set 39\u003c\/p\u003e \u003cp\u003eVisualizing with Parallel Coordinates Plots 39\u003c\/p\u003e \u003cp\u003eVisualizing Interrelationships between Attributes and Labels 41\u003c\/p\u003e \u003cp\u003eVisualizing Attribute and Label Correlations Using a Heat Map 48\u003c\/p\u003e \u003cp\u003eSummarizing the Process for Understanding the Rocks Versus Mines Data Set 50\u003c\/p\u003e \u003cp\u003eReal-Valued Predictions with Factor Variables: How Old is Your Abalone? 50\u003c\/p\u003e \u003cp\u003eParallel Coordinates for Regression Problems—Visualize Variable Relationships for the Abalone Problem 55\u003c\/p\u003e \u003cp\u003eHow to Use a Correlation Heat Map for Regression—Visualize Pair-Wise Correlations for the Abalone Problem 59\u003c\/p\u003e \u003cp\u003eReal-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine Tastes 61\u003c\/p\u003e \u003cp\u003eMulticlass Classification Problem: What Type of Glass is That? 67\u003c\/p\u003e \u003cp\u003eUsing PySpark to Understand Large Data Sets 72\u003c\/p\u003e \u003cp\u003eSummary 75\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 3 Predictive Model Building: Balancing Performance, Complexity, and Big Data 77\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eThe Basic Problem: Understanding Function Approximation 78\u003c\/p\u003e \u003cp\u003eWorking with Training Data 79\u003c\/p\u003e \u003cp\u003eAssessing Performance of Predictive Models 81\u003c\/p\u003e \u003cp\u003eFactors Driving Algorithm Choices and Performance—Complexity and Data 82\u003c\/p\u003e \u003cp\u003eContrast between a Simple Problem and a Complex Problem 82\u003c\/p\u003e \u003cp\u003eContrast between a Simple Model and a Complex Model 85\u003c\/p\u003e \u003cp\u003eFactors Driving Predictive Algorithm Performance 89\u003c\/p\u003e \u003cp\u003eChoosing an Algorithm: Linear or Nonlinear? 90\u003c\/p\u003e \u003cp\u003eMeasuring the Performance of Predictive Models 91\u003c\/p\u003e \u003cp\u003ePerformance Measures for Different Types of Problems 91\u003c\/p\u003e \u003cp\u003eSimulating Performance of Deployed Models 105\u003c\/p\u003e \u003cp\u003eAchieving Harmony between Model and Data 107\u003c\/p\u003e \u003cp\u003eChoosing a Model to Balance Problem Complexity, Model Complexity, and Data Set Size 107\u003c\/p\u003e \u003cp\u003eUsing Forward Stepwise Regression to Control Overfitting 109\u003c\/p\u003e \u003cp\u003eEvaluating and Understanding Your Predictive Model 114\u003c\/p\u003e \u003cp\u003eControl Overfitting by Penalizing Regression Coefficients—Ridge Regression 116\u003c\/p\u003e \u003cp\u003eUsing PySpark for Training Penalized Regression Models on Extremely Large Data Sets 124\u003c\/p\u003e \u003cp\u003eSummary 127\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 4 Penalized Linear Regression 129\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eWhy Penalized Linear Regression Methods are So Useful 130\u003c\/p\u003e \u003cp\u003eExtremely Fast Coefficient Estimation 130\u003c\/p\u003e \u003cp\u003eVariable Importance Information 131\u003c\/p\u003e \u003cp\u003eExtremely Fast Evaluation When Deployed 131\u003c\/p\u003e \u003cp\u003eReliable Performance 131\u003c\/p\u003e \u003cp\u003eSparse Solutions 132\u003c\/p\u003e \u003cp\u003eProblem May Require Linear Model 132\u003c\/p\u003e \u003cp\u003eWhen to Use Ensemble Methods 132\u003c\/p\u003e \u003cp\u003ePenalized Linear Regression: Regulating Linear Regression for Optimum Performance 132\u003c\/p\u003e \u003cp\u003eTraining Linear Models: Minimizing Errors and More 135\u003c\/p\u003e \u003cp\u003eAdding a Coefficient Penalty to the OLS Formulation 136\u003c\/p\u003e \u003cp\u003eOther Useful Coefficient Penalties—Manhattan and ElasticNet 137\u003c\/p\u003e \u003cp\u003eWhy Lasso Penalty Leads to Sparse Coefficient Vectors 138\u003c\/p\u003e \u003cp\u003eElasticNet Penalty Includes Both Lasso and Ridge 140\u003c\/p\u003e \u003cp\u003eSolving the Penalized Linear Regression Problem 141\u003c\/p\u003e \u003cp\u003eUnderstanding Least Angle Regression and Its Relationship to Forward Stepwise Regression 141\u003c\/p\u003e \u003cp\u003eHow LARS Generates Hundreds of Models of Varying Complexity 145\u003c\/p\u003e \u003cp\u003eChoosing the Best Model from the Hundreds LARS Generates 147\u003c\/p\u003e \u003cp\u003eUsing Glmnet: Very Fast and Very General 152\u003c\/p\u003e \u003cp\u003eComparison of the Mechanics of Glmnet and LARS Algorithms 153\u003c\/p\u003e \u003cp\u003eInitializing and Iterating the Glmnet Algorithm 153\u003c\/p\u003e \u003cp\u003eExtension of Linear Regression to Classification Problems 157\u003c\/p\u003e \u003cp\u003eSolving Classification Problems with Penalized Regression 157\u003c\/p\u003e \u003cp\u003eWorking with Classification Problems Having More Than Two Outcomes 161\u003c\/p\u003e \u003cp\u003eUnderstanding Basis Expansion: Using Linear Methods on Nonlinear Problems 161\u003c\/p\u003e \u003cp\u003eIncorporating Non-Numeric Attributes into Linear Methods 163\u003c\/p\u003e \u003cp\u003eSummary 166\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 5 Building Predictive Models Using Penalized Linear Methods 169\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003ePython Packages for Penalized Linear Regression 170\u003c\/p\u003e \u003cp\u003eMultivariable Regression: Predicting Wine Taste 171\u003c\/p\u003e \u003cp\u003eBuilding and Testing a Model to Predict Wine Taste 172\u003c\/p\u003e \u003cp\u003eTraining on the Whole Data Set before Deployment 175\u003c\/p\u003e \u003cp\u003eBasis Expansion: Improving Performance by Creating New Variables from Old Ones 179\u003c\/p\u003e \u003cp\u003eBinary Classification: Using Penalized Linear Regression to Detect Unexploded Mines 182\u003c\/p\u003e \u003cp\u003eBuild a Rocks Versus Mines Classifier for Deployment 191\u003c\/p\u003e \u003cp\u003eMulticlass Classification: Classifying Crime Scene Glass Samples 200\u003c\/p\u003e \u003cp\u003eLinear Regression and Classification Using PySpark 203\u003c\/p\u003e \u003cp\u003eUsing PySpark to Predict Wine Taste 204\u003c\/p\u003e \u003cp\u003eLogistic Regression with PySpark: Rocks Versus Mines 208\u003c\/p\u003e \u003cp\u003eIncorporating Categorical Variables in a PySpark Model: Predicting Abalone Rings 213\u003c\/p\u003e \u003cp\u003eMulticlass Logistic Regression with Meta Parameter Optimization 217\u003c\/p\u003e \u003cp\u003eSummary 219\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 6 Ensemble Methods 221\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eBinary Decision Trees 222\u003c\/p\u003e \u003cp\u003eHow a Binary Decision Tree Generates Predictions 224\u003c\/p\u003e \u003cp\u003eHow to Train a Binary Decision Tree 225\u003c\/p\u003e \u003cp\u003eTree Training Equals Split Point Selection 227\u003c\/p\u003e \u003cp\u003eHow Split Point Selection Affects Predictions 228\u003c\/p\u003e \u003cp\u003eAlgorithm for Selecting Split Points 229\u003c\/p\u003e \u003cp\u003eMultivariable Tree Training—Which Attribute to Split? 229\u003c\/p\u003e \u003cp\u003eRecursive Splitting for More Tree Depth 230\u003c\/p\u003e \u003cp\u003eOverfitting Binary Trees 231\u003c\/p\u003e \u003cp\u003eMeasuring Overfit with Binary Trees 231\u003c\/p\u003e \u003cp\u003eBalancing Binary Tree Complexity for Best Performance 232\u003c\/p\u003e \u003cp\u003eModifi cations for Classification and Categorical Features 235\u003c\/p\u003e \u003cp\u003eBootstrap Aggregation: “Bagging” 235\u003c\/p\u003e \u003cp\u003eHow Does the Bagging Algorithm Work? 236\u003c\/p\u003e \u003cp\u003eBagging Performance—Bias Versus Variance 239\u003c\/p\u003e \u003cp\u003eHow Bagging Behaves on Multivariable Problem 241\u003c\/p\u003e \u003cp\u003eBagging Needs Tree Depth for Performance 245\u003c\/p\u003e \u003cp\u003eSummary of Bagging 246\u003c\/p\u003e \u003cp\u003eGradient Boosting 246\u003c\/p\u003e \u003cp\u003eBasic Principle of Gradient Boosting Algorithm 246\u003c\/p\u003e \u003cp\u003eParameter Settings for Gradient Boosting 249\u003c\/p\u003e \u003cp\u003eHow Gradient Boosting Iterates toward a Predictive Model 249\u003c\/p\u003e \u003cp\u003eGetting the Best Performance from Gradient Boosting 250\u003c\/p\u003e \u003cp\u003eGradient Boosting on a Multivariable Problem 253\u003c\/p\u003e \u003cp\u003eSummary for Gradient Boosting 256\u003c\/p\u003e \u003cp\u003eRandom Forests 256\u003c\/p\u003e \u003cp\u003eRandom Forests: Bagging Plus Random Attribute Subsets 259\u003c\/p\u003e \u003cp\u003eRandom Forests Performance Drivers 260\u003c\/p\u003e \u003cp\u003eRandom Forests Summary 261\u003c\/p\u003e \u003cp\u003eSummary 262\u003c\/p\u003e \u003cp\u003e\u003cb\u003eChapter 7 Building Ensemble Models with Python 265\u003c\/b\u003e\u003c\/p\u003e \u003cp\u003eSolving Regression Problems with Python Ensemble Packages 265\u003c\/p\u003e \u003cp\u003eUsing Gradient Boosting to Predict Wine Taste 266\u003c\/p\u003e \u003cp\u003eUsing the Class Constructor for GradientBoostingRegressor 266\u003c\/p\u003e \u003cp\u003eUsing GradientBoostingRegressor to Implement a Regression Model 268\u003c\/p\u003e \u003cp\u003eAssessing the Performance of a Gradient Boosting Model 271\u003c\/p\u003e \u003cp\u003eBuilding a Random Forest Model to Predict Wine Taste 272\u003c\/p\u003e \u003cp\u003eConstructing a RandomForestRegressor Object 273\u003c\/p\u003e \u003cp\u003eModeling Wine Taste with RandomForestRegressor 275\u003c\/p\u003e \u003cp\u003eVisualizing the Performance of a Random Forest Regression Model 279\u003c\/p\u003e \u003cp\u003eIncorporating Non-Numeric Attributes in Python Ensemble Models 279\u003c\/p\u003e \u003cp\u003eCoding the Sex of Abalone for Gradient Boosting Regression in Python 280\u003c\/p\u003e \u003cp\u003eAssessing Performance and the Importance of Coded Variables with Gradient Boosting 282\u003c\/p\u003e \u003cp\u003eCoding the Sex of Abalone for Input to Random Forest Regression in Python 284\u003c\/p\u003e \u003cp\u003eAssessing Performance and the Importance of Coded Variables 287\u003c\/p\u003e \u003cp\u003eSolving Binary Classification Problems with Python Ensemble Methods 288\u003c\/p\u003e \u003cp\u003eDetecting Unexploded Mines with Python Gradient Boosting 288\u003c\/p\u003e \u003cp\u003eDetermining the Performance of a Gradient Boosting Classifier 291\u003c\/p\u003e \u003cp\u003eDetecting Unexploded Mines with Python Random Forest 292\u003c\/p\u003e \u003cp\u003eConstructing a Random Forest Model to Detect Unexploded Mines 294\u003c\/p\u003e \u003cp\u003eDetermining the Performance of a Random Forest Classifier 298\u003c\/p\u003e \u003cp\u003eSolving Multiclass Classification Problems with Python Ensemble Methods 300\u003c\/p\u003e \u003cp\u003eDealing with Class Imbalances 301\u003c\/p\u003e \u003cp\u003eClassifying Glass Using Gradient Boosting 301\u003c\/p\u003e \u003cp\u003eDetermining the Performance of the Gradient Boosting Model on Glass Classification 306\u003c\/p\u003e \u003cp\u003eClassifying Glass with Random Forests 307\u003c\/p\u003e \u003cp\u003eDetermining the Performance of the Random Forest Model on Glass Classification 310\u003c\/p\u003e \u003cp\u003eSolving Regression Problems with PySpark Ensemble Packages 311\u003c\/p\u003e \u003cp\u003ePredicting Wine Taste with PySpark Ensemble Methods 312\u003c\/p\u003e \u003cp\u003ePredicting Abalone Age with PySpark Ensemble Methods 317\u003c\/p\u003e \u003cp\u003eDistinguishing Mines from Rocks with PySpark\u003c\/p\u003e \u003cp\u003eEnsemble Methods 321\u003c\/p\u003e \u003cp\u003eIdentifying Glass Types with PySpark Ensemble Methods 325\u003c\/p\u003e \u003cp\u003eSummary 327\u003c\/p\u003e \u003cp\u003eIndex 329\u003c\/p\u003e","brand":"John Wiley \u0026 Sons Inc","offers":[{"title":"Default Title","offer_id":49407084527959,"sku":"9781119561934","price":30.39,"currency_code":"GBP","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0817\/1739\/5799\/files\/9781119561934.jpg?v=1730498124","url":"https:\/\/bookcurl.com\/products\/machine-learning-with-spark-and-python-9781119561934","provider":"Book Curl","version":"1.0","type":"link"}