Data mining Books

267 products


  • Data Mining and Business Analytics with R

    John Wiley & Sons Inc Data Mining and Business Analytics with R

    15 in stock

    Book SynopsisCollecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets.Trade Review"I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (Journal of the American Statistical Association, 1 January 2014)Table of ContentsPreface ix Acknowledgments xi 1. Introduction 1 Reference 6 2. Processing the Information and Getting to Know Your Data 7 2.1 Example 1: 2006 Birth Data 7 2.2 Example 2: Alumni Donations 17 2.3 Example 3: Orange Juice 31 References 39 3. Standard Linear Regression 40 3.1 Estimation in R 43 3.2 Example 1: Fuel Efficiency of Automobiles 43 3.3 Example 2: Toyota Used-Car Prices 47 Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53 References 54 4. Local Polynomial Regression: a Nonparametric Regression Approach 55 4.1 Model Selection 56 4.2 Application to Density Estimation and the Smoothing of Histograms 58 4.3 Extension to the Multiple Regression Model 58 4.4 Examples and Software 58 References 65 5. Importance of Parsimony in Statistical Modeling 67 5.1 How Do We Guard Against False Discovery 67 References 70 6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71 6.1 Example 1: Prostate Cancer 74 6.2 Example 2: Orange Juice 78 References 82 7. Logistic Regression 83 7.1 Building a Linear Model for Binary Response Data 83 7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85 7.3 Statistical Inference 85 7.4 Classification of New Cases 86 7.5 Estimation in R 87 7.6 Example 1: Death Penalty Data 87 7.7 Example 2: Delayed Airplanes 92 7.8 Example 3: Loan Acceptance 100 7.9 Example 4: German Credit Data 103 References 107 8. Binary Classification, Probabilities, and Evaluating Classification Performance 108 8.1 Binary Classification 108 8.2 Using Probabilities to Make Decisions 108 8.3 Sensitivity and Specificity 109 8.4 Example: German Credit Data 109 9. Classification Using a Nearest Neighbor Analysis 115 9.1 The k-Nearest Neighbor Algorithm 116 9.2 Example 1: Forensic Glass 117 9.3 Example 2: German Credit Data 122 Reference 125 10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical Predictor Variables 126 10.1 Example: Delayed Airplanes 127 Reference 131 11. Multinomial Logistic Regression 132 11.1 Computer Software 134 11.2 Example 1: Forensic Glass 134 11.3 Example 2: Forensic Glass Revisited 141 Appendix 11.A Specification of a Simple Triplet Matrix 147 References 149 12. More on Classification and a Discussion on Discriminant Analysis 150 12.1 Fisher’s Linear Discriminant Function 153 12.2 Example 1: German Credit Data 154 12.3 Example 2: Fisher Iris Data 156 12.4 Example 3: Forensic Glass Data 157 12.5 Example 4: MBA Admission Data 159 Reference 160 13. Decision Trees 161 13.1 Example 1: Prostate Cancer 167 13.2 Example 2: Motorcycle Acceleration 179 13.3 Example 3: Fisher Iris Data Revisited 182 14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185 14.1 R Packages for Tree Construction 185 14.2 Chi-Square Automatic Interaction Detection (CHAID) 186 14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188 14.4 Support Vector Machines (SVM) 192 14.5 Neural Networks 192 14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193 References 195 15. Clustering 196 15.1 k-Means Clustering 196 15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204 15.3 Hierarchical Clustering Procedures 212 References 219 16. Market Basket Analysis: Association Rules and Lift 220 16.1 Example 1: Online Radio 222 16.2 Example 2: Predicting Income 227 References 234 17. Dimension Reduction: Factor Models and Principal Components 235 17.1 Example 1: European Protein Consumption 238 17.2 Example 2: Monthly US Unemployment Rates 243 18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247 18.1 Three Examples 249 References 257 19. Text as Data: Text Mining and Sentiment Analysis 258 19.1 Inverse Multinomial Logistic Regression 259 19.2 Example 1: Restaurant Reviews 261 19.3 Example 2: Political Sentiment 266 Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268 References 271 20. Network Data 272 20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274 20.2 Example 2: Connections in a Friendship Network 278 References 292 Appendix A: Exercises 293 Exercise 1 294 Exercise 2 294 Exercise 3 296 Exercise 4 298 Exercise 5 299 Exercise 6 300 Exercise 7 301 Appendix B: References 338 Index 341

    15 in stock

    £98.06

  • Statistical Data Analytics

    John Wiley & Sons Inc Statistical Data Analytics

    10 in stock

    Book SynopsisA comprehensive introduction to statistical methods for data mining and knowledge discovery.Table of ContentsPreface xiii Part I Background: Introductory Statistical Analytics 1 1 Data analytics and data mining 3 1.1 Knowledge discovery: finding structure in data 3 1.2 Data quality versus data quantity 5 1.3 Statistical modeling versus statistical description 7 2 Basic probability and statistical distributions 10 2.1 Concepts in probability 10 2.1.1 Probability rules 11 2.1.2 Random variables and probability functions 12 2.1.3 Means, variances, and expected values 17 2.1.4 Median, quartiles, and quantiles 18 2.1.5 Bivariate expected values, covariance, and correlation 20 2.2 Multiple random variables∗ 21 2.3 Univariate families of distributions 23 2.3.1 Binomial distribution 23 2.3.2 Poisson distribution 26 2.3.3 Geometric distribution 27 2.3.4 Negative binomial distribution 27 2.3.5 Discrete uniform distribution 28 2.3.6 Continuous uniform distribution 29 2.3.7 Exponential distribution 29 2.3.8 Gamma and chi-square distributions 30 2.3.9 Normal (Gaussian) distribution 32 2.3.10 Distributions derived from normal 37 2.3.11 The exponential family 41 3 Data manipulation 49 3.1 Random sampling 49 3.2 Data types 51 3.3 Data summarization 52 3.3.1 Means, medians, and central tendency 52 3.3.2 Summarizing variation 56 3.3.3 Summarizing (bivariate) correlation 59 3.4 Data diagnostics and data transformation 60 3.4.1 Outlier analysis 60 3.4.2 Entropy∗ 62 3.4.3 Data transformation 64 3.5 Simple smoothing techniques 65 3.5.1 Binning 66 3.5.2 Moving averages∗ 67 3.5.3 Exponential smoothing∗ 69 4 Data visualization and statistical graphics 76 4.1 Univariate visualization 77 4.1.1 Strip charts and dot plots 77 4.1.2 Boxplots 79 4.1.3 Stem-and-leaf plots 81 4.1.4 Histograms and density estimators 83 4.1.5 Quantile plots 87 4.2 Bivariate and multivariate visualization 89 4.2.1 Pie charts and bar charts 90 4.2.2 Multiple boxplots and QQ plots 95 4.2.3 Scatterplots and bubble plots 98 4.2.4 Heatmaps 102 4.2.5 Time series plots∗ 105 5 Statistical inference 115 5.1 Parameters and likelihood 115 5.2 Point estimation 117 5.2.1 Bias 118 5.2.2 The method of moments 118 5.2.3 Least squares/weighted least squares 119 5.2.4 Maximum likelihood∗ 120 5.3 Interval estimation 123 5.3.1 Confidence intervals 123 5.3.2 Single-sample intervals for normal (Gaussian) parameters 124 5.3.3 Two-sample intervals for normal (Gaussian) parameters 128 5.3.4 Wald intervals and likelihood intervals∗ 131 5.3.5 Delta method intervals∗ 135 5.3.6 Bootstrap intervals∗ 137 5.4 Testing hypotheses 138 5.4.1 Single-sample tests for normal (Gaussian) parameters 140 5.4.2 Two-sample tests for normal (Gaussian) parameters 142 5.4.3 Walds tests, likelihood ratio tests, and ‘exact’ tests∗ 145 5.5 Multiple inferences∗ 148 5.5.1 Bonferroni multiplicity adjustment 149 5.5.2 False discovery rate 151 Part II Statistical Learning and Data Analytics 161 6 Techniques for supervised learning: simple linear regression 163 6.1 What is “supervised learning?” 163 6.2 Simple linear regression 164 6.2.1 The simple linear model 164 6.2.2 Multiple inferences and simultaneous confidence bands 171 6.3 Regression diagnostics 175 6.4 Weighted least squares (WLS) regression 184 6.5 Correlation analysis 187 6.5.1 The correlation coefficient 187 6.5.2 Rank correlation 190 7 Techniques for supervised learning: multiple linear regression 198 7.1 Multiple linear regression 198 7.1.1 Matrix formulation 199 7.1.2 Weighted least squares for the MLR model 200 7.1.3 Inferences under the MLR model 201 7.1.4 Multicollinearity 208 7.2 Polynomial regression 210 7.3 Feature selection 211 7.3.1 R2p plots 212 7.3.2 Information criteria: AIC and BIC 215 7.3.3 Automated variable selection 216 7.4 Alternative regression methods∗ 223 7.4.1 Loess 224 7.4.2 Regularization: ridge regression 230 7.4.3 Regularization and variable selection: the Lasso 238 7.5 Qualitative predictors: ANOVA models 242 8 Supervised learning: generalized linear models 258 8.1 Extending the linear regression model 258 8.1.1 Nonnormal data and the exponential family 258 8.1.2 Link functions 259 8.2 Technical details for GLiMs∗ 259 8.2.1 Estimation 260 8.2.2 The deviance function 261 8.2.3 Residuals 262 8.2.4 Inference and model assessment 264 8.3 Selected forms of GLiMs 265 8.3.1 Logistic regression and binary-data GLiMs 265 8.3.2 Trend testing with proportion data 271 8.3.3 Contingency tables and log-linear models 273 8.3.4 Gamma regression models 281 9 Supervised learning: classification 291 9.1 Binary classification via logistic regression 292 9.1.1 Logistic discriminants 292 9.1.2 Discriminant rule accuracy 296 9.1.3 ROC curves 297 9.2 Linear discriminant analysis (LDA) 297 9.2.1 Linear discriminant functions 297 9.2.2 Bayes discriminant/classification rules 302 9.2.3 Bayesian classification with normal data 303 9.2.4 Naïve Bayes classifiers 308 9.3 k-Nearest neighbor classifiers 308 9.4 Tree-based methods 312 9.4.1 Classification trees 312 9.4.2 Pruning 314 9.4.3 Boosting 321 9.4.4 Regression trees 321 9.5 Support vector machines∗ 322 9.5.1 Separable data 322 9.5.2 Nonseparable data 325 9.5.3 Kernel transformations 326 10 Techniques for unsupervised learning: dimension reduction 341 10.1 Unsupervised versus supervised learning 341 10.2 Principal component analysis 342 10.2.1 Principal components 342 10.2.2 Implementing a PCA 344 10.3 Exploratory factor analysis 351 10.3.1 The factor analytic model 351 10.3.2 Principal factor estimation 353 10.3.3 Maximum likelihood estimation 354 10.3.4 Selecting the number of factors 355 10.3.5 Factor rotation 356 10.3.6 Implementing an EFA 357 10.4 Canonical correlation analysis∗ 361 11 Techniques for unsupervised learning: clustering and association 373 11.1 Cluster analysis 373 11.1.1 Hierarchical clustering 376 11.1.2 Partitioned clustering 384 11.2 Association rules/market basket analysis 395 11.2.1 Association rules for binary observations 396 11.2.2 Measures of rule quality 397 11.2.3 The Apriori algorithm 398 11.2.4 Statistical measures of association quality 402 A Matrix manipulation 411 A.1 Vectors and matrices 411 A.2 Matrix algebra 412 A.3 Matrix inversion 414 A.4 Quadratic forms 415 A.5 Eigenvalues and eigenvectors 415 A.6 Matrix factorizations 416 A.6.1 QR decomposition 417 A.6.2 Spectral decomposition 417 A.6.3 Matrix square root 417 A.6.4 Singular value decomposition 418 A.7 Statistics via matrix operations 419 B Brief introduction to R 421 B.1 Data entry and manipulation 422 B.2 A turbo-charged calculator 426 B.3 R functions 427 B.3.1 Inbuilt R functions 427 B.3.2 Flow control 429 B.3.3 User-defined functions 429 B.4 R packages 430 References 432 Index 453

    10 in stock

    £117.81

  • Data Mining and Learning Analytics

    John Wiley & Sons Inc Data Mining and Learning Analytics

    15 in stock

    Book SynopsisAddresses the impacts of data mining on education and reviews applications in educational research teaching, and learning This book discusses the insights, challenges, issues, expectations, and practical implementation of data mining (DM) within educational mandates. Initial series of chapters offer a general overview of DM, Learning Analytics (LA), and data collection models in the context of educational research, while also defining and discussing data mining's four guiding principles prediction, clustering, rule association, and outlier detection. The next series of chapters showcase the pedagogical applications of Educational Data Mining (EDM) and feature case studies drawn from Business, Humanities, Health Sciences, Linguistics, and Physical Sciences education that serve to highlight the successes and some of the limitations of data mining research applications in educational settings. The remaining chapters focus exclusively on EDM's emerging role in helping to aTable of ContentsNotes on Contributors xi Introduction: Education At Computational Crossroads xxiiiSamira ElAtia, Donald Ipperciel, and Osmar R. Zaïane Part I At The Intersection of Two Fields: EDM 1 Chapter 1 Educational Process Mining: A Tutorial and Case Study Using Moodle Data Sets 3Cristóbal Romero, Rebeca Cerezo, Alejandro Bogarín, and Miguel Sanchez‐Santillán 1.1 Background 5 1.2 Data Description and Preparation 7 1.2.1 Preprocessing Log Data 7 1.2.2 Clustering Approach for Grouping Log Data 11 1.3 Working with ProM 16 1.3.1 Discovered Models 19 1.3.2 Analysis of the Models’ Performance 23 1.4 Conclusion 26 Acknowledgments 27 References 27 Chapter 2 On Big Data And Text Mining in the Humanities29Geoffrey Rockwell and Bettina Berendt 2.1 Busa and the Digital Text 30 2.2 Thesaurus Linguae Graecae and the Ibycus Computer as Infrastructure 32 2.2.1 Complete Data Sets 33 2.3 Cooking with Statistics 35 2.4 Conclusions 37 References 38 Chapter 3 Finding Predictors in Higher Education41David Eubanks, William Evers Jr., and Nancy Smith 3.1 Contrasting Traditional and Computational Methods 42 3.2 Predictors and Data Exploration 45 3.3 Data Mining Application: An Example 50 3.4 Conclusions 52 References 53 Chapter 4 Educational Data Mining: A MOOC Experience55Ryan S. Baker, Yuan Wang, Luc Paquette, Vincent Aleven, Octav Popescu, Jonathan Sewall, Carolyn Rosé, Gaurav Singh Tomar, Oliver Ferschke, Jing Zhang, Michael J. Cennamo, Stephanie Ogden, Therese Condit, José Diaz, Scott Crossley, Danielle S. McNamara, Denise K. Comer, Collin F. Lynch, Rebecca Brown, Tiffany Barnes, and Yoav Bergner 4.1 Big Data in Education: The Course 55 4.1.1 Iteration 1: Coursera 55 4.1.2 Iteration 2: edX 56 4.2 Cognitive Tutor Authoring Tools 57 4.3 Bazaar 58 4.4 Walkthrough 58 4.4.1 Course Content 58 4.4.2 Research on BDEMOOC 61 4.5 Conclusion 65 Acknowledgments 65 References 65 Chapter 5 Data Mining and Action Research 67Ellina Chernobilsky, Edith Ries, and Joanne Jasmine 5.1 Process 69 5.2 Design Methodology 71 5.3 Analysis and Interpretation of Data 72 5.3.1 Quantitative Data Analysis and Interpretation 73 5.3.2 Qualitative Data Analysis and Interpretation 74 5.4 Challenges 75 5.5 Ethics 76 5.6 Role of Administration in the Data Collection Process 76 5.7 Conclusion 77 References 77 Part II Pedagogical Applications of EDM79 Chapter 6 Design of an Adaptive Learning System and Educational Data Mining81Zhiyong Liu and Nick Cercone 6.1 Dimensionalities of the User Model in ALS 83 6.2 Collecting Data for ALS 85 6.3 Data Mining in ALS 86 6.3.1 Data Mining for User Modeling 87 6.3.2 Data Mining for Knowledge Discovery 88 6.4 ALS Model and Function Analyzing 90 6.4.1 Introduction of Module Functions 90 6.4.2 Analyzing the Workflow 93 6.5 Future Works 94 6.6 Conclusions 94 Acknowledgment 95 References 95 Chapter 7 The “Geometry” of Naive Bayes: Teaching Probabilities by “Drawing” Them99Giorgio Maria Di Nunzio 7.1 Introduction 99 7.1.1 Main Contribution 100 7.1.2 Related Works 101 7.2 The Geometry of NB Classification 102 7.2.1 Mathematical Notation 102 7.2.2 Bayesian Decision Theory 103 7.3 Two-Dimensional Probabilities 105 7.3.1 Working with Likelihoods and Priors Only 107 7.3.2 De‐normalizing Probabilities 108 7.3.3 NB Approach 109 7.3.4 Bernoulli Naïve Bayes 110 7.4 A New Decision Line: Far from the Origin 111 7.4.1 De‐normalization Makes (Some) Problems Linearly Separable 112 7.5 Likelihood Spaces, When Logarithms make a Difference (or a SUM) 114 7.5.1 De‐normalization Makes (Some) Problems Linearly Separable 115 7.5.2 A New Decision in Likelihood Spaces 116 7.5.3 A Real Case Scenario: Text Categorization 117 7.6 Final Remarks 118 References 119 Chapter 8 Examining the Learning Networks of a MOOC121Meaghan Brugha and Jean‐Paul Restoule 8.1 Review of Literature 122 8.2 Course Context 124 8.3 Results and Discussion 125 8.4 Recommendations for Future Research 133 8.5 Conclusions 134 References 135 Chapter 9 Exploring the Usefulness of Adaptive ELearning Laboratory Environments in Teaching Medical Science139Thuan Thai and Patsie Polly 9.1 Introduction 139 9.2 Software for Learning and Teaching 141 9.2.1 Reflective Practice: ePortfolio 141 9.2.2 Online Quizzes 143 9.2.3 Online Practical Lessons 144 9.2.4 Virtual Laboratories 145 9.2.5 The Gene Suite 147 9.3 Potential Limitations 152 9.4 Conclusion 153 Acknowledgments 153 References 154 Chapter 10 Investigating Co‐Occurrence Patterns of Learners’ Grammatical Errors across Proficiency Levels and Essay Topics Based on Association Analysis 157Yutaka Ishii 10.1 Introduction 157 10.1.1 The Relationship between Data Mining and Educational Research 157 10.1.2 English Writing Instruction in the Japanese Context 158 10.2 Literature Review 159 10.3 Method 160 10.3.1 Konan‐JIEM Learner Corpus 160 10.3.2 Association Analysis 162 10.4 Experiment 1 162 10.5 Experiment 2 163 10.6 Discussion and Conclusion 164 Appendix A: Example of Learner’s Essay (University Life) 164 Appendix B: Support Values of all Topics 165 Appendix C: Support Values of Advanced, Intermediate, and Beginner Levels of Learners 168 References 169 Part III EDM and Educational Research 173 Chapter 11 Mining Learning Sequences in MOOCs: Does Course Design Constrain Students’ Behaviors Or Do Students Shape Their Own Learning? 175Lorenzo Vigentini, Simon McIntyre, Negin Mirriahi, and Dennis Alonzo 11.1 Introduction 175 11.1.1 Perceptions and Challenges of MOOC Design 176 11.1.2 What Do We Know About Participants’ Navigation: Choice and Control 177 11.2 Data Mining in MOOCs: Related Work 178 11.2.1 Setting the Hypotheses 179 11.3 The Design and Intent of the LTTO MOOC 180 11.3.1 Course Grading and Certification 183 11.3.2 Delivering the Course 183 11.3.3 Operationalize Engagement, Personal Success, and Course Success in LTTO 184 11.4 Data Analysis 184 11.4.1 Approaches to Process the Data Sources 185 11.4.2 LTTO in Numbers 186 11.4.3 Characterizing Patterns of Completion and Achievement 186 11.4.4 Redefining Participation and Engagement 189 11.5 Mining Behaviors and Intents 191 11.5.1 Participants’ Intent and Behaviors: A Classification Model 191 11.5.2 Natural Clustering Based on Behaviors 194 11.5.3 Stated Intents and Behaviors: Are They Related? 198 11.6 Closing the Loop: Informing Pedagogy and Course Enhancement 198 11.6.1 Conclusions, Lessons Learnt, and Future Directions 200 References 201 Chapter 12 Understanding Communication Patterns in MOOCs: Combining Data Mining and Qualitative Methods 207Rebecca Eynon, Isis Hjorth, Taha Yasseri, and Nabeel Gillani 12.1 Introduction 207 12.2 Methodological Approaches to Understanding Communication Patterns in MOOCs 209 12.3 Description 210 12.3.1 Structural Connections 211 12.4 Examining Dialogue 213 12.5 Interpretative Models 214 12.6 Understanding Experience 215 12.7 Experimentation 216 12.8 Future Research 217 References 218 Chapter 13 An Example of Data Mining: Exploring The Relationship Between Applicant Attributes and Academic Measures of Success in a Pharmacy Program 223Dion Brocks and Ken Cor 13.1 Introduction 223 13.2 Methods 225 13.3 Results 228 13.4 Discussion 230 13.4.1 Prerequisite Predictors 230 13.4.2 Demographic Predictors 232 13.5 Conclusion 234 Appendix A 234 References 236 Chapter 14 A New Way of Seeing: Using a Data Mining Approach to Understand Children’s Views of Diversity and “Difference” in Picture Books237Robin A. Moeller and Hsin‐liang Chen 14.1 Introduction 237 14.2 Study 1: Using Data Mining to Better Understand Perceptions of Race 238 14.2.1 Background 238 14.2.2 Research Questions 239 14.2.3 Methods 240 14.2.4 Findings 240 14.2.5 Discussion 248 14.3 Study 2: Translating Data Mining Results to Picture Book Concepts of “Difference” 248 14.3.1 Background 248 14.3.2 Research Questions 249 14.3.3 Methodology 250 14.3.4 Findings 250 14.3.5 Discussion and Implications 252 14.4 Conclusions 252 References 252 Chapter 15 Data Mining with Natural Language Processing and Corpus Linguistics: Unlocking Access to School Children’s Language in Diverse Contexts to Improve Instructional and Assessment Practices255Alison L. Bailey, Anne Blackstock‐Bernstein, Eve Ryan, and Despina Pitsoulakis 15.1 Introduction 255 15.2 Identifying the Problem 256 15.3 Use of Corpora and Technology in Language Instruction and Assessment 261 15.3.1 Language Corpora in ESL and EFL Teaching and Learning 261 15.3.2 Previous Extensions of Corpus Linguistics to School‐Age Language 262 15.3.3 Corpus Linguistics in Language Assessment 263 15.3.4 Big Data Purposes, Techniques, and Technology 264 15.4 Creating a School‐Age Learner Corpus and Digital Data Analytics System 266 15.4.1 Language Measures Included in DRGON 267 15.4.2 The DLLP as a Promising Practice 268 15.5 Next Steps, “Modest Data,” and Closing Remarks 269 Acknowledgments 271 Appendix A: Examples of Oral and Written Explanation Elicitation Prompts 272 References 272 Index 277

    15 in stock

    £98.06

  • Effective CRM using Predictive Analytics

    John Wiley & Sons Inc Effective CRM using Predictive Analytics

    15 in stock

    Book SynopsisA step-by-step guide to data mining applications in CRM. Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques. The book is organized into three parts.Table of ContentsPreface xiii Acknowledgments xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the data 1 1.1 The applications 1 1.2 The methodology 4 1.3 The algorithms 6 1.3.1 Supervised models 6 1.3.1.1 Classification models 7 1.3.1.2 Estimation (regression) models 9 1.3.1.3 Feature selection (field screening) 10 1.3.2 Unsupervised models 10 1.3.2.1 Cluster models 11 1.3.2.2 Association (affinity) and sequence models 12 1.3.2.3 Dimensionality reduction models 14 1.3.2.4 Record screening models 14 1.4 The data 15 1.4.1 The mining datamart 16 1.4.2 The required data per industry 16 1.4.3 The customer “signature”: from the mining datamart to the enriched, marketing reference table 16 1.5 Summary 20 Part I The Methodology 21 2 Classification modeling methodology 23 2.1 An overview of the methodology for classification modeling 23 2.2 Business understanding and design of the process 24 2.2.1 Definition of the business objective 24 2.2.2 Definition of the mining approach and of the data model 26 2.2.3 Design of the modeling process 27 2.2.3.1 Defining the modeling population 27 2.2.3.2 Determining the modeling (analysis) level 28 2.2.3.3 Definition of the target event and population 28 2.2.3.4 Deciding on time frames 29 2.3 Data understanding, preparation, and enrichment 33 2.3.1 Investigation of data sources 34 2.3.2 Selecting the data sources to be used 34 2.3.3 Data integration and aggregation 35 2.3.4 Data exploration, validation, and cleaning 35 2.3.5 Data transformations and enrichment 38 2.3.6 Applying a validation technique 40 2.3.6.1 Split or Holdout validation 40 2.3.6.2 Cross or n‐fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test–control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high‐value customers 96 2.9 Cross‐selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep‐selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up‐selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of “premium” product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we’ve learned so far: it’s not about the tool or the modeling algorithm. It’s about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 “Technical” evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine‐tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi‐square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K‐means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card‐level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field’s distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross‐selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross‐selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross‐selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross‐sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross‐selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross‐selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross‐selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K‐means cluster 354 8.6.1 Clustering with the K‐means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362

    15 in stock

    £41.36

  • Statistical Data Analytics

    John Wiley & Sons Inc Statistical Data Analytics

    15 in stock

    Book SynopsisSolutions Manual to accompany Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery A comprehensive introduction to statistical methods for data mining and knowledge discovery.Extensivesolutions using actual data (with sample R programming code) are provided, illustrating diverse informatic sources in genomics, biomedicine, ecological remote sensing, astronomy, socioeconomics, marketing, advertising and finance, among many others.Table of ContentsPreface vii 1 Data analytics and data mining 1 2 Basic probability and statistical distributions 3 3 Data manipulation 14 4 Data visualization and statistical graphics 28 5 Statistical inference 45 6 Techniques for supervised learning: simple linear regression 65 7 Techniques for supervised learning: multiple linear regression 90 8 Supervised learning: generalized linear models 134 9 Supervised learning: classification 154 10 Techniques for unsupervised learning: dimension reduction 185 11 Techniques for unsupervised learning: clustering and association 200 References 216

    15 in stock

    £16.10

  • Fraud Analytics Using Descriptive Predictive and

    John Wiley & Sons Inc Fraud Analytics Using Descriptive Predictive and

    15 in stock

    Book SynopsisDetect fraud earlier to mitigate loss and prevent cascading damage Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution.Table of ContentsList of Figures xv Foreword xxiii Preface xxv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for Fraud Detection 15 Data-Driven Fraud Detection 17 Fraud-Detection Techniques 19 Fraud Cycle 22 The Fraud Analytics Process Model 26 Fraud Data Scientists 30 A Fraud Data Scientist Should Have Solid Quantitative Skills 30 A Fraud Data Scientist Should Be a Good Programmer 31 A Fraud Data Scientist Should Excel in Communication and Visualization Skills 31 A Fraud Data Scientist Should Have a Solid Business Understanding 32 A Fraud Data Scientist Should Be Creative 32 A Scientific Perspective on Fraud 33 References 35 Chapter 2 Data Collection, Sampling, and Preprocessing 37 Introduction 38 Types of Data Sources 38 Merging Data Sources 43 Sampling 45 Types of Data Elements 46 Visual Data Exploration and Exploratory Statistical Analysis 47 Benford’s Law 48 Descriptive Statistics 51 Missing Values 52 Outlier Detection and Treatment 53 Red Flags 57 Standardizing Data 59 Categorization 60 Weights of Evidence Coding 63 Variable Selection 65 Principal Components Analysis 68 RIDITs 72 PRIDIT Analysis 73 Segmentation 74 References 75 Chapter 3 Descriptive Analytics for Fraud Detection 77 Introduction 78 Graphical Outlier Detection Procedures 79 Statistical Outlier Detection Procedures 83 Break-Point Analysis 84 Peer-Group Analysis 85 Association Rule Analysis 87 Clustering 89 Introduction 89 Distance Metrics 90 Hierarchical Clustering 94 Example of Hierarchical Clustering Procedures 97 k-Means Clustering 104 Self-Organizing Maps 109 Clustering with Constraints 111 Evaluating and Interpreting Clustering Solutions 114 One-Class SVMs 117 References 118 Chapter 4 Predictive Analytics for Fraud Detection 121 Introduction 122 Target Definition 123 Linear Regression 125 Logistic Regression 127 Basic Concepts 127 Logistic Regression Properties 129 Building a Logistic Regression Scorecard 131 Variable Selection for Linear and Logistic Regression 133 Decision Trees 136 Basic Concepts 136 Splitting Decision 137 Stopping Decision 140 Decision Tree Properties 141 Regression Trees 142 Using Decision Trees in Fraud Analytics 143 Neural Networks 144 Basic Concepts 144 Weight Learning 147 Opening the Neural Network Black Box 150 Support Vector Machines 155 Linear Programming 155 The Linear Separable Case 156 The Linear Nonseparable Case 159 The Nonlinear SVM Classifier 160 SVMs for Regression 161 Opening the SVM Black Box 163 Ensemble Methods 164 Bagging 164 Boosting 165 Random Forests 166 Evaluating Ensemble Methods 167 Multiclass Classification Techniques 168 Multiclass Logistic Regression 168 Multiclass Decision Trees 170 Multiclass Neural Networks 170 Multiclass Support Vector Machines 171 Evaluating Predictive Models 172 Splitting Up the Data Set 172 Performance Measures for Classification Models 176 Performance Measures for Regression Models 185 Other Performance Measures for Predictive Analytical Models 188 Developing Predictive Models for Skewed Data Sets 189 Varying the Sample Window 190 Undersampling and Oversampling 190 Synthetic Minority Oversampling Technique (SMOTE) 192 Likelihood Approach 194 Adjusting Posterior Probabilities 197 Cost-sensitive Learning 198 Fraud Performance Benchmarks 200 References 201 Chapter 5 Social Network Analysis for Fraud Detection 207 Networks: Form, Components, Characteristics, and Their Applications 209 Social Networks 211 Network Components 214 Network Representation 219 Is Fraud a Social Phenomenon? An Introduction to Homophily 222 Impact of the Neighborhood: Metrics 227 Neighborhood Metrics 228 Centrality Metrics 238 Collective Inference Algorithms 246 Featurization: Summary Overview 254 Community Mining: Finding Groups of Fraudsters 254 Extending the Graph: Toward a Bipartite Representation 266 Multipartite Graphs 269 Case Study: Gotcha! 270 References 277 Chapter 6 Fraud Analytics: Post-Processing 279 Introduction 280 The Analytical Fraud Model Life Cycle 280 Model Representation 281 Traffic Light Indicator Approach 282 Decision Tables 283 Selecting the Sample to Investigate 286 Fraud Alert and Case Management 290 Visual Analytics 296 Backtesting Analytical Fraud Models 302 Introduction 302 Backtesting Data Stability 302 Backtesting Model Stability 305 Backtesting Model Calibration 308 Model Design and Documentation 311 References 312 Chapter 7 Fraud Analytics: A Broader Perspective 313 Introduction 314 Data Quality 314 Data-Quality Issues 314 Data-Quality Programs and Management 315 Privacy 317 The RACI Matrix 318 Accessing Internal Data 319 Label-Based Access Control (LBAC) 324 Accessing External Data 325 Capital Calculation for Fraud Loss 326 Expected and Unexpected Losses 327 Aggregate Loss Distribution 329 Capital Calculation for Fraud Loss Using Monte Carlo Simulation 331 An Economic Perspective on Fraud Analytics 334 Total Cost of Ownership 334 Return on Investment 335 In Versus Outsourcing 337 Modeling Extensions 338 Forecasting 338 Text Analytics 340 The Internet of Things 342 Corporate Fraud Governance 344 References 346 About the Authors 347 Index 349

    15 in stock

    £29.25

  • Text Mining in Practice with R

    John Wiley & Sons Inc Text Mining in Practice with R

    15 in stock

    Book SynopsisA reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.Table of ContentsForeword 1 Chapter 1: What is Text Mining? 1 1.1 What is it? 1 1.1.1 What is text mining in practice? 1 1.1.2 Where does text mining fit? 1 1.2 Why we care about text mining? 1 1.2.1 What are the consequences of ignoring text? 1 1.2.2 What are the benefits of text mining? 1 1.2.3 Setting Expectations: When text mining should (and should not) be used. 1 1.3 A basic workflow. How the process works. 1 1.4 What tools do I need to get started with this? 1 1.5 A Simple Example 1 1.6 A Real World Use Case 1 1.7 Summary 1 Chapter 2: Basics of text mining 1 2.1 What is Text Mining in a practical sense? 1 2.2 Types of Text Mining: Bag of Words. 1 2.2.1 Types of Text Mining: Syntactic Parsing. 1 2.3 The text mining process in context 1 2.4 String Manipulation: Number of Characters & Substitutions 1 2.4.1 String Manipulations: Paste, Character Splits & Extractions 1 2.5 Keyword Scanning 1 2.6 String Packages stringr & stringi 1 2.7 Preprocessing Steps for Bag of Words Text Mining 1 2.8 Spell Check 1 2.9 Frequent Terms & Associations 1 2.9 Delta Assist Wrap Up 1 2.10 Summary 1 Chapter 3: Common Text Mining Visualizations 1 3.1 A tale of two (or three) cultures 1 3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1 3.2.1 Term Frequency 1 3.2.2 Word Associations 1 3.2.3 Word Networks 1 3.3 Simple Word Clusters: Hierarchical Dendrograms 1 3.4 Word Clouds: Overused but Effective 1 3.4.1 One Corpus Word Clouds 1 3.4.2 Comparing and Contrasting Corpora in Word Clouds 1 3.4.3 Polarized Tag Plot 1 3.5 Summary 1 Chapter 4: Sentiment Scoring 1 4.1 What is Sentiment Analysis? 1 4.2 Sentiment Scoring: Parlor Trick or Insightful? 1 4.3 Polarity: Simple Sentiment Scoring 1 4.3.1 Subjectivity Lexicons 1 4.3.2 Qdap’s Scoring for positive and negative word choice 1 4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1 4.4 Emoticons :) Dealing with these perplexing clues 1 4.4.1 Symbol-Based Emoticons Native to R 1 4.4.2 Punctuation Based Emoticons 1 4.4.3 Emoji 1 4.5 R’s Archived Sentiment Scoring Library 1 4.5 Sentiment the tidytext way 1 4.6 Airbnb.com Boston Wrap Up 1 4.7 Summary 1 Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1 5.1 What is clustering? 1 5.1.1 K Means Clustering 1 5.1.2 Spherical K Means Clustering 1 5.1.3 K Mediod Clustering 1 5.1.4 Evaluating the cluster approaches 1 5.2 Calculating & Exploring String Distance 1 5.2.1 What is string distance? 1 5.2.2 Fuzzy Matching-amatch, ain 1 5.2.3 Similarity Distances- stringdist, stringdistmatrix 1 5.3 LDA Topic Modeling Explained 1 5.3.2 Topic Modeling Case Study 1 5.3.2 LDA &LDAvis 1 5.4 Text to Vectors using “text2vec” 1 5.4.1 text2vec 1 5.5 Summary 1 Chapter 6: Document Classification: Finding Clickbait from Headlines 1 6.1 What is document classification? 1 6.2 Clickbait Case Study 1 6.2.2 Session & Data Set Up 1 6.2.3 GLMNET Training 1 6.2.4 GLMNET Test Predictions 1 6.2.5 Test Set Evaluation 1 6.2.6 Finding the most impactful words 1 6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1 6.3 Summary 1 Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1 7.1 Classification Vs Prediction 1 7.2 Case Study I: Will this patient come back to the hospital? 1 7.2.2 Patient Readmission in the Text Mining Workflow 1 7.2.3 Session & Data Set Up 1 7.2.4 Patient Modeling 1 7.2.5 More Model KPI: AUC, Recall, Precision & F1 1 7.2.5.1 Additional Evaluation Metrics 1 7.2.6 Apply the model to new patients 1 7.2.7 Patient Readmission Conclusion 1 7.3 Case Study II: Predicting Box Office Success 1 7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1 7.3.3 Session & Data Set Up 1 7.3.4 Opening Weekend Modeling 1 7.3.5 Model Evaluation 1 7.3.6 Apply the Model to new Movie Reviews 1 7.3.7 Movie Revenue Conclusion 1 7.4 Summary 1 Chapter 8: The OpenNLP Project 1 8.1 What is the OpenNLP project? 1 8.2 R’s OpenNLP Package 1 8.3 Named Entities in Hillary Clinton’s Email 1 8.3.1 R Session Set-up 1 8.3.2 Minor Text Cleaning 1 8.3.3 Using OpenNLP on a single email 1 8.3.4 Using OpenNLP on multiple documents 1 8.3.5 Revisiting the Text Mining Workflow 1 8.4 Analyzing the Named Entities 1 8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1 8.4.2 Mapping Only European Locations 1 8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1 8.4.4 Stock Charts for Entities 1 8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1 8.5 Summary 1 Chapter 9: Text Sources 1 9.1 Sourcing Text 1 9.2 Web Sources 1 9.2.1 Web Scraping a Single Page with rvest 1 9.2.2 Web Scraping Multiple Pages with rvest 1 9.2.3 Application Program Interfaces (APIs) 1 9.2.4 Newspaper Articles from The Guardian Newspaper 1 9.2.5 Tweets using the “twitteR” Package 1 9.2.6 Calling an API without a dedicated R package 1 9.2.7 Using jsonlite to access the New York Times 1 9.2.8 Using RCurl & XML to Parse Google News Feeds 1 9.2.9 The tm library Web-Mining Plugin 1 9.3 Getting Text from File Sources 1 9.3.1 Individual CSV, TXT and Microsoft Office Files 1 9.3.2 Reading multiple files quickly 1 9.3.2 Extracting Text from PDFs 1 9.3.3 Optical Character Recognition: Extracting Text from Images 1 9.4 Summary 1

    15 in stock

    £49.46

  • Optimization Techniques and Applications with

    John Wiley & Sons Inc Optimization Techniques and Applications with

    1 in stock

    Book SynopsisA guide to modern optimization applications and techniques in newly emerging areas spanning optimization, data science, machine intelligence, engineering, and computer sciences Optimization Techniques and Applications with Examples introduces the fundamentals of all the commonly used techniquesin optimization that encompass the broadness and diversity of the methods (traditional and new) and algorithms. The authora noted expert in the fieldcovers a wide range of topics including mathematical foundations, optimization formulation, optimality conditions, algorithmic complexity, linear programming, convex optimization, and integer programming. In addition, the book discusses artificial neural network, clustering and classifications, constraint-handling, queueing theory, support vector machine and multi-objective optimization, evolutionary computation, nature-inspired algorithms and many other topics. Designed as a practical resource, all topics are explained in detail with step-by-stepTable of ContentsList of Figures xiii List of Tables xvii Preface xix Acknowledgements xxi Acronyms xxiii Introduction xxv Part I Fundamentals 1 1 Mathematical Foundations 3 1.1 Functions and Continuity 3 1.1.1 Functions 3 1.1.2 Continuity 4 1.1.3 Upper and Lower Bounds 4 1.2 Review of Calculus 6 1.2.1 Differentiation 6 1.2.2 Taylor Expansions 9 1.2.3 Partial Derivatives 12 1.2.4 Lipschitz Continuity 13 1.2.5 Integration 14 1.3 Vectors 16 1.3.1 Vector Algebra 17 1.3.2 Norms 17 1.3.3 2D Norms 19 1.4 Matrix Algebra 19 1.4.1 Matrices 19 1.4.2 Determinant 23 1.4.3 Rank of a Matrix 24 1.4.4 Frobenius Norm 25 1.5 Eigenvalues and Eigenvectors 25 1.5.1 Definiteness 28 1.5.2 Quadratic Form 29 1.6 Optimization and Optimality 31 1.6.1 Minimum and Maximum 31 1.6.2 Feasible Solution 32 1.6.3 Gradient and Hessian Matrix 32 1.6.4 Optimality Conditions 34 1.7 General Formulation of Optimization Problems 35 Exercises 36 Further Reading 36 2 Algorithms, Complexity, and Convexity 37 2.1 What Is an Algorithm? 37 2.2 Order Notations 39 2.3 Convergence Rate 40 2.4 Computational Complexity 42 2.4.1 Time and Space Complexity 42 2.4.2 Class P 43 2.4.3 Class NP 44 2.4.4 NP-Completeness 44 2.4.5 Complexity of Algorithms 45 2.5 Convexity 46 2.5.1 Linear and Affine Functions 46 2.5.2 Convex Functions 48 2.5.3 Subgradients 50 2.6 Stochastic Nature in Algorithms 51 2.6.1 Algorithms with Randomization 51 2.6.2 Random Variables 51 2.6.3 Poisson Distribution and Gaussian Distribution 54 2.6.4 Monte Carlo 56 2.6.5 Common Probability Distributions 58 Exercises 61 Bibliography 62 Part II Optimization Techniques and Algorithms 63 3 Optimization 65 3.1 Unconstrained Optimization 65 3.1.1 Univariate Functions 65 3.1.2 Multivariate Functions 68 3.2 Gradient-Based Methods 70 3.2.1 Newton’s Method 71 3.2.2 Convergence Analysis 72 3.2.3 Steepest Descent Method 73 3.2.4 Line Search 77 3.2.5 Conjugate Gradient Method 78 3.2.6 Stochastic Gradient Descent 79 3.2.7 Subgradient Method 81 3.3 Gradient-Free Nelder–Mead Method 81 3.3.1 A Simplex 81 3.3.2 Nelder–Mead Downhill Simplex Method 82 Exercises 84 Bibliography 84 4 Constrained Optimization 87 4.1 Mathematical Formulation 87 4.2 Lagrange Multipliers 87 4.3 Slack Variables 91 4.4 Generalized Reduced Gradient Method 94 4.5 KKT Conditions 97 4.6 PenaltyMethod 99 Exercises 101 Bibliography 101 5 Optimization Techniques: Approximation Methods 103 5.1 BFGS Method 103 5.2 Trust-Region Method 105 5.3 Sequential Quadratic Programming 107 5.3.1 Quadratic Programming 107 5.3.2 SQP Procedure 107 5.4 Convex Optimization 109 5.5 Equality Constrained Optimization 113 5.6 Barrier Functions 115 5.7 Interior-PointMethods 119 5.8 Stochastic and Robust Optimization 121 Exercises 123 Bibliography 123 Part III Applied Optimization 125 6 Linear Programming 127 6.1 Introduction 127 6.2 Simplex Method 129 6.2.1 Slack Variables 129 6.2.2 Standard Formulation 130 6.2.3 Duality 131 6.2.4 Augmented Form 132 6.3 Worked Example by Simplex Method 133 6.4 Interior-PointMethod for LP 136 Exercises 139 Bibliography 139 7 Integer Programming 141 7.1 Integer Linear Programming 141 7.1.1 Review of LP 141 7.1.2 Integer LP 142 7.2 LP Relaxation 143 7.3 Branch and Bound 146 7.3.1 How to Branch 153 7.4 Mixed Integer Programming 155 7.5 Applications of LP, IP, and MIP 156 7.5.1 Transport Problem 156 7.5.2 Product Portfolio 158 7.5.3 Scheduling 160 7.5.4 Knapsack Problem 161 7.5.5 Traveling Salesman Problem 161 Exercises 163 Bibliography 163 8 Regression and Regularization 165 8.1 Sample Mean and Variance 165 8.2 Regression Analysis 168 8.2.1 Maximum Likelihood 168 8.2.2 Regression 168 8.2.3 Linearization 173 8.2.4 Generalized Linear Regression 175 8.2.5 Goodness of Fit 178 8.3 Nonlinear Least Squares 179 8.3.1 Gauss–Newton Algorithm 180 8.3.2 Levenberg–Marquardt Algorithm 182 8.3.3 Weighted Least Squares 183 8.4 Over-fitting and Information Criteria 184 8.5 Regularization and Lasso Method 186 8.6 Logistic Regression 187 8.7 Principal Component Analysis 191 Exercises 195 Bibliography 196 9 Machine Learning Algorithms 199 9.1 Data Mining 199 9.1.1 Hierarchy Clustering 200 9.1.2 k-Means Clustering 201 9.1.3 Distance Metric 202 9.2 Data Mining for Big Data 202 9.2.1 Characteristics of Big Data 203 9.2.2 Statistical Nature of Big Data 203 9.2.3 Mining Big Data 204 9.3 Artificial Neural Networks 206 9.3.1 Neuron Model 207 9.3.2 Neural Networks 208 9.3.3 Back Propagation Algorithm 210 9.3.4 Loss Functions in ANN 212 9.3.5 Stochastic Gradient Descent 213 9.3.6 Restricted Boltzmann Machine 214 9.4 Support Vector Machines 216 9.4.1 Statistical Learning Theory 216 9.4.2 Linear Support Vector Machine 217 9.4.3 Kernel Functions and Nonlinear SVM 220 9.5 Deep Learning 221 9.5.1 Learning 221 9.5.2 Deep Neural Nets 222 9.5.3 Tuning of Hyper-Parameters 223 Exercises 223 Bibliography 224 10 Queueing Theory and Simulation 227 10.1 Introduction 227 10.1.1 Components of Queueing 227 10.1.2 Notations 228 10.2 Arrival Model 230 10.2.1 Poisson Distribution 230 10.2.2 Inter-arrival Time 233 10.3 Service Model 233 10.3.1 Exponential Distribution 233 10.3.2 Service Time Model 235 10.3.3 Erlang Distribution 235 10.4 Basic QueueingModel 236 10.4.1 M/M/1 Queue 236 10.4.2 M/M/s Queue 240 10.5 Little’s Law 242 10.6 Queue Management and Optimization 243 Exercises 245 Bibliography 246 Part IV Advanced Topics 249 11 Multiobjective Optimization 251 11.1 Introduction 251 11.2 Pareto Front and Pareto Optimality 253 11.3 Choice and Challenges 255 11.4 Transformation to Single Objective Optimization 256 11.4.1 Weighted Sum Method 256 11.4.2 Utility Function 259 11.5 The 𝜖-Constraint Method 261 11.6 Evolutionary Approaches 264 11.6.1 Metaheuristics 264 11.6.2 Non-Dominated Sorting Genetic Algorithm 265 Exercises 266 Bibliography 266 12 Constraint-Handling Techniques 269 12.1 Introduction and Overview 269 12.2 Method of Lagrange Multipliers 270 12.3 Barrier Function Method 272 12.4 PenaltyMethod 272 12.5 Equality Constraints via Tolerance 273 12.6 Feasibility Criteria 274 12.7 Stochastic Ranking 275 12.8 Multiobjective Constraint-Handling and Ranking 276 Exercises 276 Bibliography 277 Part V Evolutionary Computation and Nature-Inspired Algorithms 279 13 Evolutionary Algorithms 281 13.1 Evolutionary Computation 281 13.3.1 Basic Procedure 284 13.3.2 Choice of Parameters 285 13.4 Simulated Annealing 287 13.5 Differential Evolution 290 Exercises 293 Bibliography 293 14 Nature-Inspired Algorithms 297 14.1 Introduction to SI 297 14.2 Ant and Bee Algorithms 298 14.3 Particle Swarm Optimization 299 14.3.1 Accelerated PSO 301 14.3.2 Binary PSO 302 14.4 Firefly Algorithm 303 14.5 Cuckoo Search 306 14.5.1 CS Algorithm 307 14.5.2 Lévy Flight 309 14.5.3 Advantages of CS 312 14.6 Bat Algorithm 313 14.7 Flower Pollination Algorithm 315 14.8 Other Algorithms 319 Exercises 319 Bibliography 319 Appendix A Notes on Software Packages 323 Appendix B Problem Solutions 329 Index 345

    1 in stock

    £93.56

  • Data Science Strategy For Dummies

    John Wiley & Sons Inc Data Science Strategy For Dummies

    15 in stock

    Book SynopsisAll the answers to your data science questions Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the what and the why of data science and covering what it takes to lead and nurture a top-notch team of data scientists. With this book, you'll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your real-life challenges as you uncover the stories and value hidden within data. Learn exactly what data science is and why it's importantAdopt a data-driven mindset as the foundation to successUnderstand the processes and common roadblocks behind data scienceKeep your data science program focused on generating business valueNurture a top-quality data science team In non-technical language, Data Science Strategy For Dummies outlTable of ContentsForeword xv Introduction 1 About This Book 2 Foolish Assumptions 3 How This Book is Organized 3 Icons Used In This Book 4 Beyond The Book 4 Where To Go From Here 5 Part 1: Optimizing Your Data Science Investment 7 Chapter 1: Framing Data Science Strategy 9 Establishing the Data Science Narrative 10 Capture 11 Maintain 12 Process 13 Analyze 14 Communicate 16 Actuate 17 Sorting Out the Concept of a Data-driven Organization 19 Approaching data-driven 20 Being data obsessed 21 Sorting Out the Concept of Machine Learning 22 Defining and Scoping a Data Science Strategy 26 Objectives 26 Approach 27 Choices 27 Data 27 Legal 28 Ethics 28 Competence 28 Infrastructure 29 Governance and security 29 Commercial/business models 30 Measurements 30 Chapter 2: Considering the Inherent Complexity in Data Science 31 Diagnosing Complexity in Data Science 32 Recognizing Complexity as a Potential 33 Enrolling in Data Science Pitfalls 101 34 Believing that all data is needed 34 Thinking that investing in a data lake will solve all your problems 35 Focusing on AI when analytics is enough 36 Believing in the 1-tool approach 37 Investing only in certain areas 37 Leveraging the infrastructure for reporting rather than exploration 38 Underestimating the need for skilled data scientists 39 `Navigating the Complexity 40 Chapter 3: Dealing with Difficult Challenges 41 Getting Data from There to Here 41 Handling dependencies on data owned by others 42 Managing data transfer and computation across-country borders 43 Managing Data Consistency Across the Data Science Environment 44 Securing Explainability in AI 45 Dealing with the Difference between Machine Learning and Traditional Software Programming 47 Managing the Rapid AI Technology Evolution and Lack of Standardization 50 Chapter 4: Managing Change in Data Science 51 Understanding Change Management in Data Science 52 Approaching Change in Data Science 53 Recognizing what to avoid when driving change in data science 56 Using Data Science Techniques to Drive Successful Change 59 Using digital engagement tools 59 Applying social media analytics to identify stakeholder sentiment 60 Capturing reference data in change projects 61 Using data to select people for change roles 61 Automating change metrics 62 Getting Started 62 Part 2: Making Strategic Choices for Your Data 65 Chapter 5: Understanding the Past, Present, and Future of Data 67 Sorting Out the Basics of Data 68 Explaining traditional data versus big data 69 Knowing the value of data 71 Exploring Current Trends in Data 73 Data monetization 73 Responsible AI 74 Cloud-based data architectures 75 Computation and intelligence in the edge 75 Digital twins 77 Blockchain 78 Conversational platforms 79 Elaborating on Some Future Scenarios 80 Standardization for data science productivity 80 From data monetization scenarios to a data economy 82 An explosion of human/machine hybrid systems 82 Quantum computing will solve the unsolvable problems 83 Chapter 6: Knowing Your Data 85 Selecting Your Data 85 Describing Data 87 Exploring Data 89 Assessing Data Quality 93 Improving Data Quality 95 Chapter 7: Considering the Ethical Aspects of Data Science 97 Explaining AI Ethics 98 Addressing trustworthy artificial intelligence 99 Introducing Ethics by Design 101 Chapter 8: Becoming Data-driven 103 Understanding Why Data-Driven is a Must 103 Transitioning to a Data-Driven Model 105 Securing management buy-in and assigning a chief data officer (CDO) 106 Identifying the key business value aligned with the business maturity 107 Developing a Data Strategy 108 Caring for your data 109 Democratizing the data 109 Driving data standardization 110 Structuring the data strategy 110 Establishing a Data-Driven Culture and Mindset 111 Chapter 9: Evolving from Data-driven to Machine-driven 113 Digitizing the Data 114 Applying a Data-driven Approach 115 Automating Workflows 116 Introducing AI/ML capabilities 116 Part 3: Building a Successful Data Science Organization 119 Chapter 10: Building Successful Data Science Teams 121 Starting with the Data Science Team Leader 121 Adopting different leadership approaches 122 Approaching data science leadership 124 Finding the right data science leader or manager 124 Defining the Prerequisites for a Successful Team 125 Developing a team structure 125 Establishing an infrastructure 126 Ensuring data availability 126 Insisting on interesting projects 127 Promoting continuous learning 127 Encouraging research studies 128 Building the Team 128 Developing smart hiring processes 129 Letting your teams evolve organically 130 Connecting the Team to the Business Purpose 131 Chapter 11: Approaching a Data Science Organizational Setup 133 Finding the Right Organizational Design 134 Designing the data science function 134 Evaluating the benefits of a center of excellence for data science 136 Identifying success factors for a data science center of excellence 137 Applying a Common Data Science Function 138 Selecting a location 138 Approaching ways of working 139 Managing expectations 141 Selecting an execution approach 142 Chapter 12: Positioning the Role of the Chief Data Officer (CDO) 145 Scoping the Role of the Chief Data Officer (CDO) 146 Explaining Why a Chief Data Officer is Needed 149 Establishing the CDO Role 150 The Future of the CDO Role 152 Chapter 13: Acquiring Resources and Competencies 155 Identifying the Roles in a Data Science Team 156 Data scientist 157 Data engineer 157 Machine learning engineer 158 Data architect 159 Business analyst 159 Software engineer 159 Domain expert 160 Seeing What Makes a Great Data Scientist 160 Structuring a Data Science Team 163 Hiring and evaluating the data science talent you need 165 Retaining Competence in Data Science 167 Understanding what makes a data scientist leave 169 Part 4: Investing in the Right Infrastructure 173 Chapter 14: Developing a Data Architecture 175 Defining What Makes Up a Data Architecture 176 Describing traditional architectural approaches 176 Elements of a data architecture 177 Exploring the Characteristics of a Modern Data Architecture 178 Explaining Data Architecture Layers 181 Listing the Essential Technologies for a Modern Data Architecture 184 NoSQL databases 184 Real-time streaming platforms 185 Docker and containers 185 Container repositories 186 Container orchestration 187 Microservices 187 Function as a service 188 Creating a Modern Data Architecture 189 Chapter 15: Focusing Data Governance on the Right Aspects 193 Sorting Out Data Governance 194 Data governance for defense or offense 195 Objectives for data governance 196 Explaining Why Data Governance is Needed 197 Data governance saves money 197 Bad data governance is dangerous 198 Good data governance provides clarity 198 Establishing Data Stewardship to Enforce Data Governance Rules 198 Implementing a Structured Approach to Data Governance 199 Chapter 16: Managing Models During Development and Production 203 Unfolding the Fundamentals of Model Management 203 Working with many models 204 Making the case for efficient model management 206 Implementing Model Management 207 Pinpointing implementation challenges 208 Managing model risk 210 Measuring the risk level 211 Identifying suitable control mechanisms 211 Chapter 17: Exploring the Importance of Open Source 213 Exploring the Role of Open Source 213 Understanding the importance of open source in smaller companies 214 Understanding the trend 215 Describing the Context of Data Science Programming Languages 215 Unfolding Open Source Frameworks for AI/ML Models 218 TensorFlow 219 Theano 219 Torch 219 Caffe and Caffe2 220 The Microsoft Cognitive Toolkit (previously known as Microsoft CNTK) 220 Keras 220 Scikit-learn 221 Spark MLlib 221 Azure ML Studio 221 Amazon Machine Learning 221 Choosing Open Source or Not? 222 Chapter 18: Realizing the Infrastructure 223 Approaching Infrastructure Realization 223 Listing Key Infrastructure Considerations for AI and ML Support 226 Location 226 Capacity 227 Data center setup 227 End-to-end management 227 Network infrastructure 228 Security and ethics 228 Advisory and supporting services 229 Ecosystem fit 229 Automating Workflows in Your Data Infrastructure 229 Enabling an Efficient Workspace for Data Engineers and Data Scientists 230 Part 5: Data as a Business 233 Chapter 19: Investing in Data as a Business 235 Exploring How to Monetize Data 236 Approaching data monetization is about treating data as an asset 237 Data monetization in a data economy 238 Looking to the Future of the Data Economy 240 Chapter 20: Using Data for Insights or Commercial Opportunities 243 Focusing Your Data Science Investment 243 Determining the Drivers for Internal Business Insights 244 Recognizing data science categories for practical implementation 245 Applying data-science-driven internal business insights 247 Using Data for Commercial Opportunities 248 Defining a data product 249 Distinguishing between categories of data products 250 Balancing Strategic Objectives 252 Chapter 21: Engaging Differently with Your Customers 255 Understanding Your Customers 255 Step 1: Engage your customers 256 Step 2: Identify what drives your customers 257 Step 3: Apply analytics and machine learning to customer actions 258 Step 4: Predict and prepare for the next step 259 Step 5: Imagine your customer’s future 260 Keeping Your Customers Happy 261 Serving Customers More Efficiently 263 Predicting demand 263 Automating tasks 264 Making company applications predictive 264 Chapter 22: Introducing Data-driven Business Models 265 Defining Business Models 265 Exploring Data-driven Business Models 267 Creating data-centric businesses 268 Investigating different types of data-driven business models 268 Using a Framework for Data-driven Business Models 275 Creating a data-driven business model using a framework 276 Key resources 277 Key activities 277 Offering/value proposition 278 Customer segment 278 Revenue model 279 Cost structure 280 Putting it all together 280 Chapter 23: Handling New Delivery Models 281 Defining Delivery Models for Data Products and Services 282 Understanding and Adapting to New Delivery Models 282 Introducing New Ways to Deliver Data Products 284 Self-service analytics environments as a delivery model 285 Applications, websites, and product/service interfaces as delivery models 287 Existing products and services 289 Downloadable files 290 APIs 290 Cloud services 291 Online market places 291 Downloadable licenses 292 Online services 293 Onsite services 293 Part 6: The Part of Tens 295 Chapter 24: Ten Reasons to Develop a Data Science Strategy 297 Expanding Your View on Data Science 297 Aligning the Company View 298 Creating a Solid Base for Execution 299 Realizing Priorities Early 299 Putting the Objective into Perspective 300 Creating an Excellent Base for Communication 300 Understanding Why Choices Matter 301 Identifying the Risks Early 301 Thoroughly Considering Your Data Need 302 Understanding the Change Impact 303 Chapter 25: Ten Mistakes to Avoid When Investing in Data Science 305 Don’t Tolerate Top Management’s Ignorance of Data Science 305 Don’t Believe That AI is Magic 306 Don’t Approach Data Science as a Race to the Death between Man and Machine 307 Don’t Underestimate the Potential of AI 308 Don’t Underestimate the Needed Data Science Skill Set 308 Don’t Think That a Dashboard is the End Objective 309 Don’t Forget about the Ethical Aspects of AI 310 Don’t Forget to Consider the Legal Rights to the Data 311 Don’t Ignore the Scale of Change Needed 312 Don’t Forget the Measurements Needed to Prove Value 313 Index 315

    15 in stock

    £20.79

  • OntologyBased Information Retrieval for

    John Wiley & Sons Inc OntologyBased Information Retrieval for

    15 in stock

    Book SynopsisWith the advancements of semantic web, ontology has become the crucial mechanism for representing concepts in various domains. For research and dispersal of customized healthcare services, a major challenge is to efficiently retrieve and analyze individual patient data from a large volume of heterogeneous data over a long time span. This requirement demands effective ontology-based information retrieval approaches for clinical information systems so that the pertinent information can be mined from large amount of distributed data. This unique and groundbreaking book highlights the key advances in ontology-based information retrieval techniques being applied in the healthcare domain and covers the following areas: Semantic data integration in e-health care systems Keyword-based medical information retrieval Ontology-based query retrieval support for e-health implementation Ontologies as a database management system technology for medicalTable of ContentsPreface xix Acknowledgment xxiii 1 Role of Ontology in Health Care 1Sonia Singla 1.1 Introduction 2 1.2 Ontology in Diabetes 3 1.2.1 Ontology Process 4 1.2.2 Impediments of the Present Investigation 5 1.3 Role of Ontology in Cardiovascular Diseases 6 1.4 Role of Ontology in Parkinson Diseases 8 1.4.1 The Spread of Disease With Age and Onset of Disease 10 1.4.2 Cost of PD for Health Care, Household 11 1.4.3 Treatment and Medicines 11 1.5 Role of Ontology in Depression 13 1.6 Conclusion 15 1.7 Future Scope 15 References 15 2 A Study on Basal Ganglia Circuit and Its Relation With Movement Disorders 19Dinesh Bhatia 2.1 Introduction 19 2.2 Anatomy and Functioning of Basal Ganglia 21 2.2.1 The Striatum-Major Entrance to Basal Ganglia Circuitry 22 2.2.2 Direct and Indirect Striatofugal Projections 23 2.2.3 The STN: Another Entrance to Basal Ganglia Circuitry 25 2.3 Movement Disorders 26 2.3.1 Parkinson Disease 26 2.3.2 Dyskinetic Disorder 27 2.3.3 Dystonia 28 2.4 Effect of Basal Ganglia Dysfunctioning on Movement Disorders 29 2.5 Conclusion and Future Scope 31 References 31 3 Extraction of Significant Association Rules Using Pre- and Post-Mining Techniques—An Analysis 37M. Nandhini and S. N. Sivanandam 3.1 Introduction 38 3.2 Background 39 3.2.1 Interestingness Measures 39 3.2.2 Pre-Mining Techniques 40 3.2.2.1 Candidate Set Reduction Schemes 40 3.2.2.2 Optimal Threshold Computation Schemes 41 3.2.2.3 Weight-Based Mining Schemes 42 3.2.3 Post-Mining Techniques 42 3.2.3.1 Rule Pruning Schemes 43 3.2.3.2 Schemes Using Knowledge Base 43 3.3 Methodology 44 3.3.1 Data Preprocessing 44 3.3.2 Pre-Mining 46 3.3.2.1 Pre-Mining Technique 1: Optimal Support and Confidence Threshold Value Computation Using PSO 46 3.3.2.2 Pre-Mining Technique 2: Attribute Weight Computation Using IG Measure 48 3.3.3 Association Rule Generation 50 3.3.3.1 ARM Preliminaries 50 3.3.3.2 WARM Preliminaries 52 3.3.4 Post-Mining 56 3.3.4.1 Filters 56 3.3.4.2 Operators 58 3.3.4.3 Rule Schemas 58 3.4 Experiments and Results 59 3.4.1 Parameter Settings for PSO-Based Pre-Mining Technique 60 3.4.2 Parameter Settings for PAW-Based Pre-Mining Technique 60 3.5 Conclusions 63 References 65 4 Ontology in Medicine as a Database Management System 69Shobowale K. O. 4.1 Introduction 70 4.1.1 Ontology Engineering and Development Methodology 72 4.2 Literature Review on Medical Data Processing 72 4.3 Information on Medical Ontology 75 4.3.1 Types of Medical Ontology 75 4.3.2 Knowledge Representation 76 4.3.3 Methodology of Developing Medical Ontology 76 4.3.4 Medical Ontology Standards 77 4.4 Ontologies as a Knowledge-Based System 78 4.4.1 Domain Ontology in Medicine 79 4.4.2 Brief Introduction of Some Medical Standards 81 4.4.2.1 Medical Subject Headings (MeSH) 81 4.4.2.2 Medical Dictionary for Regulatory Activities (MedDRA) 81 4.4.2.3 Medical Entities Dictionary (MED) 81 4.4.3 Reusing Medical Ontology 82 4.4.4 Ontology Evaluation 85 4.5 Conclusion 86 4.6 Future Scope 86 References 87 5 Using IoT and Semantic Web Technologies for Healthcare and Medical Sector 91Nikita Malik and Sanjay Kumar Malik 5.1 Introduction 92 5.1.1 Significance of Healthcare and Medical Sector and Its Digitization 92 5.1.2 e-Health and m-Health 92 5.1.3 Internet of Things and Its Use 94 5.1.4 Semantic Web and Its Technologies 96 5.2 Use of IoT in Healthcare and Medical Domain 98 5.2.1 Scope of IoT in Healthcare and Medical Sector 98 5.2.2 Benefits of IoT in Healthcare and Medical Systems 100 5.2.3 IoT Healthcare Challenges and Open Issues 100 5.3 Role of SWTs in Healthcare Services 101 5.3.1 Scope and Benefits of Incorporating Semantics in Healthcare 101 5.3.2 Ontologies and Datasets for Healthcare and Medical Domain 103 5.3.3 Challenges in the Use of SWTs in Healthcare Sector 104 5.4 Incorporating IoT and/or SWTs in Healthcare and Medical Sector 106 5.4.1 Proposed Architecture or Framework or Model 106 5.4.2 Access Mechanisms or Approaches 108 5.4.3 Applications or Systems 109 5.5 Healthcare Data Analytics Using Data Mining and Machine Learning 110 5.6 Conclusion 112 5.7 Future Work 113 References 113 6 An Ontological Model, Design, and Implementation of CSPF for Healthcare 117Pooja Mohan 6.1 Introduction 117 6.2 Related Work 119 6.3 Mathematical Representation of CSPF Model 122 6.3.1 Basic Sets of CSPF Model 123 6.3.2 Conditional Contextual Security and Privacy Constraints 123 6.3.3 CSPF Model States CsetofStates 124 6.3.4 Permission Cpermission 124 6.3.5 Security Evaluation Function (SEFcontexts) 124 6.3.6 Secure State 125 6.3.7 CSPF Model Operations 125 6.3.7.1 Administrative Operations 125 6.3.7.2 Users’ Operations 127 6.4 Ontological Model 127 6.4.1 Development of Class Hierarchy 127 6.4.1.1 Object Properties of Sensor Class 129 6.4.1.2 Data Properties 129 6.4.1.3 The Individuals 129 6.5 The Design of Context-Aware Security and Privacy Model for Wireless Sensor Network 129 6.6 Implementation 133 6.7 Analysis and Results 135 6.7.1 Inference Time/Latency/Query Response Time vs. No. of Policies 135 6.7.2 Average Inference Time vs. Contexts 136 6.8 Conclusion and Future Scope 137 References 138 7 Ontology-Based Query Retrieval Support for E-Health Implementation 143Aatif Ahmad Khan and Sanjay Kumar Malik 7.1 Introduction 143 7.1.1 Health Care Record Management 144 7.1.1.1 Electronic Health Record 144 7.1.1.2 Electronic Medical Record 145 7.1.1.3 Picture Archiving and Communication System 145 7.1.1.4 Pharmacy Systems 145 7.1.2 Information Retrieval 145 7.1.3 Ontology 146 7.2 Ontology-Based Query Retrieval Support 146 7.3 E-Health 150 7.3.1 Objectives and Scope 150 7.3.2 Benefits of E-Health 151 7.3.3 E-Health Implementation 151 7.4 Ontology-Driven Information Retrieval for E-Health 154 7.4.1 Ontology for E-Heath Implementation 155 7.4.2 Frameworks for Information Retrieval Using Ontology for E-Health 157 7.4.3 Applications of Ontology-Driven Information Retrieval in Health Care 158 7.4.4 Benefits and Limitations 160 7.5 Discussion 160 7.6 Conclusion 164 References 164 8 Ontology-Based Case Retrieval in an E-Mental Health Intelligent Information System 167Georgia Kaoura, Konstantinos Kovas and Basilis Boutsinas 8.1 Introduction 167 8.2 Literature Survey 170 8.3 Problem Identified 173 8.4 Proposed Solution 174 8.4.1 The PAVEFS Ontology 174 8.4.2 Knowledge Base 179 8.4.3 Reasoning 180 8.4.4 User Interaction 182 8.5 Pros and Cons of Solution 183 8.5.1 Evaluation Methodology and Results 183 8.5.2 Evaluation Methodology 185 8.5.2.1 Evaluation Tools 186 8.5.2.2 Results 187 8.6 Conclusions 189 8.7 Future Scope 190 References 190 9 Ontology Engineering Applications in Medical Domain 193Mariam Gawich and Marco Alfonse 9.1 Introduction 193 9.2 Ontology Activities 195 9.2.1 Ontology Learning 195 9.2.2 Ontology Matching 195 9.2.3 Ontology Merging (Unification) 195 9.2.4 Ontology Validation 196 9.2.5 Ontology Verification 196 9.2.6 Ontology Alignment 196 9.2.7 Ontology Annotation 196 9.2.8 Ontology Evaluation 196 9.2.9 Ontology Evolution 196 9.3 Ontology Development Methodologies 197 9.3.1 TOVE 197 9.3.2 Methontology 198 9.3.3 Brusa et al. Methodology 198 9.3.4 UPON Methodology 199 9.3.5 Uschold and King Methodology 200 9.4 Ontology Languages 203 9.4.1 RDF-RDF Schema 203 9.4.2 OWL 205 9.4.3 OWL 2 205 9.5 Ontology Tools 208 9.5.1 Apollo 208 9.5.2 NeON 209 9.5.3 Protégé 210 9.6 Ontology Engineering Applications in Medical Domain 212 9.6.1 Ontology-Based Decision Support System (DSS) 213 9.6.1.1 OntoDiabetic 213 9.6.1.2 Ontology-Based CDSS for Diabetes Diagnosis 214 9.6.1.3 Ontology-Based Medical DSS within E-Care Telemonitoring Platform 215 9.6.2 Medical Ontology in the Dynamic Healthcare Environment 216 9.6.3 Knowledge Management Systems 217 9.6.3.1 Ontology-Based System for Cancer Diseases 217 9.6.3.2 Personalized Care System for Chronic Patients at Home 218 9.7 Ontology Engineering Applications in Other Domains 219 9.7.1 Ontology Engineering Applications in E-Commerce 219 9.7.1.1 Automated Approach to Product Taxonomy Mapping in E-Commerce 219 9.7.1.2 LexOnt Matching Approach 221 9.7.2 Ontology Engineering Applications in Social Media Domain 222 9.7.2.1 Emotive Ontology Approach 222 9.7.2.2 Ontology-Based Approach for Social Media Analysis 224 9.7.2.3 Methodological Framework for Semantic Comparison of Emotional Values 225 References 226 10 Ontologies on Biomedical Informatics 233Marco Alfonse and Mariam Gawich 10.1 Introduction 233 10.2 Defining Ontology 234 10.3 Biomedical Ontologies and Ontology-Based Systems 235 10.3.1 MetaMap 235 10.3.2 GALEN 236 10.3.3 NIH-CDE 236 10.3.4 LOINC 237 10.3.5 Current Procedural Terminology (CPT) 238 10.3.6 Medline Plus Connect 238 10.3.7 Gene Ontology 239 10.3.8 UMLS 240 10.3.9 SNOMED-CT 240 10.3.10 OBO Foundry 240 10.3.11 Textpresso 240 10.3.12 National Cancer Institute Thesaurus 241 References 241 11 Machine Learning Techniques Best for Large Data Prediction: A Case Study of Breast Cancer Categorical Data: k-Nearest Neighbors 245Yagyanath Rimal 11.1 Introduction 246 11.2 R Programming 250 11.3 Conclusion 255 References 255 12 Need of Ontology-Based Systems in Healthcare System 257Tshepiso Larona Mokgetse 12.1 Introduction 258 12.2 What is Ontology? 259 12.3 Need for Ontology in Healthcare Systems 260 12.3.1 Primary Healthcare 262 12.3.1.1 Semantic Web System 262 12.3.2 Emergency Services 263 12.3.2.1 Service-Oriented Architecture 263 12.3.2.2 IOT Ontology 264 12.3.3 Public Healthcare 265 12.3.3.1 IOT Data Model 265 12.3.4 Chronic Disease Healthcare 266 12.3.4.1 Clinical Reminder System 266 12.3.4.2 Chronic Care Model 267 12.3.5 Specialized Healthcare 268 12.3.5.1 E-Health Record System 268 12.3.5.2 Maternal and Child Health 269 12.3.6 Cardiovascular System 270 12.3.6.1 Distributed Healthcare System 270 12.3.6.2 Records Management System 270 12.3.7 Stroke Rehabilitation 271 12.3.7.1 Patient Information System 271 12.3.7.2 Toronto Virtual System 271 12.4 Conclusion 272 References 272 13 Exploration of Information Retrieval Approaches With Focus on Medical Information Retrieval 275Mamata Rath and Jyotir Moy Chatterjee 13.1 Introduction 276 13.1.1 Machine Learning-Based Medical Information System 278 13.1.2 Cognitive Information Retrieval 278 13.2 Review of Literature 279 13.3 Cognitive Methods of IR 281 13.4 Cognitive and Interactive IR Systems 286 13.5 Conclusion 288 References 289 14 Ontology as a Tool to Enable Health Internet of Things Viable 5G Communication Networks 293Nidhi Sharma and R. K. Aggarwal 14.1 Introduction 293 14.2 From Concept Representations to Medical Ontologies 295 14.2.1 Current Medical Research Trends 296 14.2.2 Ontology as a Paradigm Shift in Health Informatics 296 14.3 Primer Literature Review 297 14.3.1 Remote Health Monitoring 298 14.3.2 Collecting and Understanding Medical Data 298 14.3.3 Patient Monitoring 298 14.3.4 Tele-Health 299 14.3.5 Advanced Human Services Records Frameworks 299 14.3.6 Applied Autonomy and Healthcare Mechanization 300 14.3.7 IoT Powers the Preventive Healthcare 301 14.3.8 Hospital Statistics Control System (HSCS) 301 14.3.9 End-to-End Accessibility and Moderateness 301 14.3.10 Information Mixing and Assessment 302 14.3.11 Following and Alerts 302 14.3.12 Remote Remedial Assistance 302 14.4 Establishments of Health IoT 303 14.4.1 Technological Challenges 304 14.4.2 Probable Solutions 306 14.4.3 Bit-by-Bit Action Statements 307 14.5 Incubation of IoT in Health Industry 307 14.5.1 Hearables 308 14.5.2 Ingestible Sensors 308 14.5.3 Moodables 308 14.5.4 PC Vision Innovation 308 14.5.5 Social Insurance Outlining 308 14.6 Concluding Remarks 309 References 309 15 Tools and Techniques for Streaming Data: An Overview 313K. Saranya, S. Chellammal and Pethuru Raj Chelliah 15.1 Introduction 314 15.2 Traditional Techniques 315 15.2.1 Random Sampling 315 15.2.2 Histograms 316 15.2.3 Sliding Window 316 15.2.4 Sketches 317 15.2.4.1 Bloom Filters 317 15.2.4.2 Count-Min Sketch 317 15.3 Data Mining Techniques 317 15.3.1 Clustering 318 15.3.1.1 STREAM 318 15.3.1.2 BRICH 318 15.3.1.3 CLUSTREAM 319 15.3.2 Classification 319 15.3.2.1 Naïve Bayesian 319 15.3.2.2 Hoeffding 320 15.3.2.3 Very Fast Decision Tree 320 15.3.2.4 Concept Adaptive Very Fast Decision Tree 320 15.4 Big Data Platforms 320 15.4.1 Apache Storm 321 15.4.2 Apache Spark 321 15.4.2.1 Apache Spark Core 321 15.4.2.2 Spark SQL 322 15.4.2.3 Machine Learning Library 322 15.4.2.4 Streaming Data API 322 15.4.2.5 GraphX 323 15.4.3 Apache Flume 323 15.4.4 Apache Kafka 323 15.4.5 Apache Flink 326 15.5 Conclusion 327 References 328 16 An Ontology-Based IR for Health Care 331J. P. Patra, Gurudatta Verma and Sumitra Samal 16.1 Introduction 331 16.2 General Definition of Information Retrieval Model 333 16.3 Information Retrieval Model Based on Ontology 334 16.4 Literature Survey 336 16.5 Methodolgy for IR 339 References 344

    15 in stock

    £164.66

  • Computation in BioInformatics

    John Wiley & Sons Inc Computation in BioInformatics

    15 in stock

    Book SynopsisCOMPUTATION IN BIOINFORMATICS Bioinformatics is a platform between the biology and information technology and this book provides readers with an understanding of the use of bioinformatics tools in new drug design. The discovery of new solutions to pandemics is facilitated through the use of promising bioinformatics techniques and integrated approaches. This book covers a broad spectrum of the bioinformatics field, starting with the basic principles, concepts, and application areas. Also covered is the role of bioinformatics in drug design and discovery, including aspects of molecular modeling. Some of the chapters provide detailed information on bioinformatics related topics, such as silicon design, protein modeling, DNA microarray analysis, DNA-RNA barcoding, and gene sequencing, all of which are currently needed in the industry. Also included are specialized topics, such as bioinformatics in cancer detection, genomics, and proteomics. Moreover, a few chapters explTable of ContentsPreface xiii 1 Bioinfomatics as a Tool in Drug Designing 1Rene Barbie Browne, Shiny C. Thomas and Jayanti Datta Roy 1.1 Introduction 1 1.2 Steps Involved in Drug Designing 3 1.2.1 Identification of the Target Protein/Enzyme 5 1.2.2 Detection of Molecular Site (Active Site) in the Target Protein 6 1.2.3 Molecular Modeling 6 1.2.4 Virtual Screening 9 1.2.5 Molecular Docking 10 1.2.6 QSAR (Quantitative Structure-Activity Relationship) 12 1.2.7 Pharmacophore Modeling 14 1.2.8 Solubility of Molecule 14 1.2.9 Molecular Dynamic Simulation 14 1.2.10 ADME Prediction 15 1.3 Various Softwares Used in the Steps of Drug Designing 16 1.4 Applications 18 1.5 Conclusion 20 References 20 2 New Strategies in Drug Discovery 25Vivek Chavda, Yogita Thalkari and Swati Marwadi 2.1 Introduction 26 2.2 Road Toward Advancement 27 2.3 Methodology 30 2.3.1 Target Identification 30 2.3.2 Docking-Based Virtual Screening 32 2.3.3 Conformation Sampling 33 2.3.4 Scoring Function 34 2.3.5 Molecular Similarity Methods 35 2.3.6 Virtual Library Construction 37 2.3.7 Sequence-Based Drug Design 37 2.4 Role of OMICS Technology 38 2.5 High-Throughput Screening and Its Tools 40 2.6 Chemoinformatic 44 2.6.1 Exploratory Data Analysis 45 2.6.2 Example Discovery 46 2.6.3 Pattern Explanation 46 2.6.4 New Technologies 46 2.7 Concluding Remarks and Future Prospects 46 References 48 3 Role of Bioinformatics in Early Drug Discovery: An Overview and Perspective 49Shasank S. Swain and Tahziba Hussain 3.1 Introduction 50 3.2 Bioinformatics and Drug Discovery 51 3.2.1 Structure-Based Drug Design (SBDD) 52 3.2.2 Ligand-Based Drug Design (LBDD) 53 3.3 Bioinformatics Tools in Early Drug Discovery 54 3.3.1 Possible Biological Activity Prediction Tools 55 3.3.2 Possible Physicochemical and Drug-Likeness Properties Verification Tools 58 3.3.3 Possible Toxicity and ADME/T Profile Prediction Tools 60 3.4 Future Directions With Bioinformatics Tool 61 3.5 Conclusion 63 Acknowledgements 64 References 64 4 Role of Data Mining in Bioinformatics 69Vivek P. Chavda, Amit Sorathiya, Disha Valu and Swati Marwadi 4.1 Introduction 70 4.2 Data Mining Methods/Techniques 71 4.2.1 Classification 71 4.2.1.1 Statistical Techniques 71 4.2.1.2 Clustering Technique 73 4.2.1.3 Visualization 74 4.2.1.4 Induction Decision Tree Technique 74 4.2.1.5 Neural Network 75 4.2.1.6 Association Rule Technique 75 4.2.1.7 Classification 75 4.3 DNA Data Analysis 77 4.4 RNA Data Analysis 79 4.5 Protein Data Analysis 79 4.6 Biomedical Data Analysis 80 4.7 Conclusion and Future Prospects 81 References 81 5 In Silico Protein Design and Virtual Screening 85Vivek P. Chavda, Zeel Patel, Yashti Parmar and Disha Chavda 5.1 Introduction 86 5.2 Virtual Screening Process 88 5.2.1 Before Virtual Screening 90 5.2.2 General Process of Virtual Screening 90 5.2.2.1 Step 1 (The Establishment of the Receptor Model) 91 5.2.2.2 Step 2 (The Generation of Small-Molecule Libraries) 92 5.2.2.3 Step 3 (Molecular Docking) 92 5.2.2.4 Step 4 (Selection of Lead Protein Compounds) 94 5.3 Machine Learning and Scoring Functions 94 5.4 Conclusion and Future Prospects 95 References 96 6 New Bioinformatics Platform-Based Approach for Drug Design 101Vivek Chavda, Soham Sheta, Divyesh Changani and Disha Chavda 6.1 Introduction 102 6.2 Platform-Based Approach and Regulatory Perspective 104 6.3 Bioinformatics Tools and Computer-Aided Drug Design 107 6.4 Target Identification 109 6.5 Target Validation 110 6.6 Lead Identification and Optimization 111 6.7 High-Throughput Methods (HTM) 112 6.8 Conclusion and Future Prospects 114 References 115 7 Bioinformatics and Its Application Areas 121Ragini Bhardwaj, Mohit Sharma and Nikhil Agrawal 7.1 Introduction 121 7.2 Review of Bioinformatics 124 7.3 Bioinformatics Applications in Different Areas 126 7.3.1 Microbial Genome Application 126 7.3.2 Molecular Medicine 129 7.3.3 Agriculture 130 7.4 Conclusion 131 References 131 8 DNA Microarray Analysis: From Affymetrix CEL Files to Comparative Gene Expression 139Sandeep Kumar, Shruti Shandilya, Suman Kapila, Mohit Sharma and Nikhil Agrawal 8.1 Introduction 140 8.2 Data Processing 140 8.2.1 Installation of Workflow 140 8.2.2 Importing the Raw Data for Processing 141 8.2.3 Retrieving Sample Annotation of the Data 142 8.2.4 Quality Control 143 8.2.4.1 Boxplot 144 8.2.4.2 Density Histogram 145 8.2.4.3 MA Plot 145 8.2.4.4 NUSE Plot 145 8.2.4.5 RLE Plot 145 8.2.4.6 RNA Degradation Plot 145 8.2.4.7 QCstat 148 8.3 Normalization of Microarray Data Using the RMA Method 148 8.3.1 Background Correction 148 8.3.2 Normalization 149 8.3.3 Summarization 149 8.4 Statistical Analysis for Differential Gene Expression 151 8.5 Conclusion 153 References 153 9 Machine Learning in Bioinformatics 155Rahul Yadav, Mohit Sharma and Nikhil Agrawal 9.1 Introduction and Background 156 9.1.1 Bioinformatics 158 9.1.2 Text Mining 159 9.1.3 IoT Devices 159 9.2 Machine Learning Applications in Bioinformatics 159 9.3 Machine Learning Approaches 161 9.4 Conclusion and Closing Remarks 162 References 162 10 DNA-RNA Barcoding and Gene Sequencing 165Gifty Sawhney, Mohit Sharma and Nikhil Agrawal 10.1 Introduction 166 10.2 RNA 169 10.3 DNA Barcoding 172 10.3.1 Introduction 172 10.3.2 DNA Barcoding and Molecular Phylogeny 177 10.3.3 Ribosomal DNA (rDNA) of the Nuclear Genome (nuDNA)—ITS 178 10.3.4 Chloroplast DNA 180 10.3.5 Mitochondrial DNA 181 10.3.6 Molecular Phylogenetic Analysis 181 10.3.7 Metabarcoding 189 10.3.8 Materials for DNA Barcoding 190 10.4 Main Reasons of DNA Barcoding 191 10.5 Limitations/Restrictions of DNA Barcoding 192 10.6 RNA Barcoding 192 10.6.1 Overview of the Method 193 10.7 Methodology 194 10.7.1 Materials Required 195 10.7.2 Barcoded RNA Sequencing High-Level Mapping of Single-Neuron Projections 196 10.7.3 Using RNA to Trace Neurons 196 10.7.4 A Life Conservation Barcoder 198 10.7.5 Gene Sequencing 199 10.7.5.1 DNA Sequencing Methods 200 10.7.5.2 First-Generation Sequencing Techniques 204 10.7.5.3 Maxam’s and Gilbert’s Chemical Method 204 10.7.5.4 Sanger Sequencing 205 10.7.5.5 Automation in DNA Sequencing 206 10.7.5.6 Use of Fluorescent-Marked Primers and ddNTPs 206 10.7.5.7 Dye Terminator Sequencing 207 10.7.5.8 Using Capillary Electrophoresis 207 10.7.6 Developments and High-Throughput Methods in DNA Sequencing 208 10.7.7 Pyrosequencing Method 209 10.7.8 The Genome Sequencer 454 FLX System 210 10.7.9 Illumina/Solexa Genome Analyzer 210 10.7.10 Transition Sequencing Techniques 211 10.7.11 Ion-Torrent’s Semiconductor Sequencing 211 10.7.12 Helico’s Genetic Analysis Platform 211 10.7.13 Third-Generation Sequencing Techniques 212 10.8 Conclusion 212 Abbreviations 213 Acknowledgement 214 References 214 11 Bioinformatics in Cancer Detection 229Mohit Sharma, Umme Abiha, Parul Chugh, Balakumar Chandrasekaran and Nikhil Agrawal 11.1 Introduction 230 11.2 The Era of Bioinformatics in Cancer 230 11.3 Aid in Cancer Research via NCI 232 11.4 Application of Big Data in Developing Precision Medicine 233 11.5 Historical Perspective and Development 235 11.6 Bioinformatics-Based Approaches in the Study of Cancer 237 11.6.1 SLAMS 237 11.6.2 Module Maps 238 11.6.3 COPA 239 11.7 Conclusion and Future Challenges 240 References 240 12 Genomic Association of Polycystic Ovarian Syndrome: Single-Nucleotide Polymorphisms and Their Role in Disease Progression 245Gowtham Kumar Subbaraj and Sindhu Varghese 12.1 Introduction 246 12.2 FSHR Gene 252 12.3 IL-10 Gene 252 12.4 IRS-1 Gene 253 12.5 PCR Primers Used 254 12.6 Statistical Analysis 255 12.7 Conclusion 258 References 259 13 An Insight of Protein Structure Predictions Using Homology Modeling 265S. Muthumanickam, P. Boomi, R. Subashkumar, S. Palanisamy, A. Sudha, K. Anand, C. Balakumar, M. Saravanan, G. Poorani, Yao Wang, K. Vijayakumar and M. Syed Ali 13.1 Introduction 266 13.2 Homology Modeling Approach 268 13.2.1 Strategies for Homology Modeling 269 13.2.2 Procedure 269 13.3 Steps Involved in Homology Modeling 270 13.3.1 Template Identification 270 13.3.2 Sequence Alignment 271 13.3.3 Backbone Generation 271 13.3.4 Loop Modeling 271 13.3.5 Side Chain Modeling 272 13.3.6 Model Optimization 272 13.3.6.1 Model Validation 272 13.4 Tools Used for Homology Modeling 273 13.4.1 Robetta 273 13.4.2 M4T (Multiple Templates) 273 13.4.3 I-Tasser (Iterative Implementation of the Threading Assembly Refinement) 273 13.4.4 ModBase 274 13.4.5 Swiss Model 274 13.4.6 PHYRE2 (Protein Homology/Analogy Recognition Engine 2) 274 13.4.7 Modeller 274 13.4.8 Conclusion 275 Acknowledgement 275 References 275 14 Basic Concepts in Proteomics and Applications 279Jesudass Joseph Sahayarayan, A.S. Enogochitra and Murugesan Chandrasekaran 14.1 Introduction 280 14.2 Challenges on Proteomics 281 14.3 Proteomics Based on Gel 283 14.4 Non-Gel–Based Electrophoresis Method 284 14.5 Chromatography 284 14.6 Proteomics Based on Peptides 285 14.7 Stable Isotopic Labeling 286 14.8 Data Mining and Informatics 287 14.9 Applications of Proteomics 289 14.10 Future Scope 290 14.11 Conclusion 291 References 292 15 Prospects of Covalent Approaches in Drug Discovery: An Overview 295Balajee Ramachandran, Saravanan Muthupandian and Jeyakanthan Jeyaraman 15.1 Introduction 296 15.2 Covalent Inhibitors Against the Biological Target 297 15.3 Application of Physical Chemistry Concepts in Drug Designing 299 15.4 Docking Methodologies—An Overview 301 15.5 Importance of Covalent Targets 302 15.6 Recent Framework on the Existing Docking Protocols 303 15.7 SN2 Reactions in the Computational Approaches 304 15.8 Other Crucial Factors to Consider in the Covalent Docking 305 15.8.1 Role of Ionizable Residues 305 15.8.2 Charge Regulation 306 15.8.3 Charge-Charge Interactions 306 15.9 QM/MM Approaches 309 15.10 Conclusion and Remarks 310 Acknowledgements 311 References 311 Index 321

    15 in stock

    £138.56

  • Machine Learning for Time Series Forecasting with

    John Wiley & Sons Inc Machine Learning for Time Series Forecasting with

    15 in stock

    Book SynopsisLearn how to apply the principles of machine learning totime series modeling with thisindispensableresource Machine Learning for Time Series Forecasting with Pythonis an incisive and straightforward examination of one of the most crucial elements of decision-makingin finance,marketing,education, and healthcare:time series modeling. Despitethe centrality of time series forecasting, few business analysts are familiar with the power or utility of applying machine learning to time series modeling. Author Francesca Lazzeri, a distinguishedmachine learning scientistandeconomist,corrects that deficiency by providing readers withcomprehensiveand approachableexplanation andtreatment of the applicationof machine learning to time series forecasting. Written for readers who have little to no experience in time seriesforecastingor machine learning, the book comprehensively coversall the topics necessary to: Understand time series forecasting concepts, such asstationarity,horizon,trend,and seasonalityPrepare time series dataformodelingEvaluatetime series forecasting models'performance and accuracyUnderstand when to use neural networks instead of traditional time series models in time series forecasting Machine Learning for Time Series Forecasting with Pythonis fullreal-world examples, resourcesand concrete strategies to help readers explore and transform data and develop usable, practical time series forecasts. Perfect for entry-level data scientists, business analysts,developers, and researchers, this book is an invaluable and indispensable guide to the fundamental and advanced concepts of machine learning applied to time series modeling. Table of ContentsAcknowledgments vii Introduction xv Chapter 1 Overview of Time Series Forecasting 1 Flavors of Machine Learning for Time Series Forecasting 3 Supervised Learning for Time Series Forecasting 14 Python for Time Series Forecasting 21 Experimental Setup for Time Series Forecasting 24 Conclusion 26 Chapter 2 How to Design an End-to-End Time Series Forecasting Solution on the Cloud 29 Time Series Forecasting Template 31 Business Understanding and Performance Metrics 33 Data Ingestion 36 Data Exploration and Understanding 39 Data Pre-processing and Feature Engineering 40 Modeling Building and Selection 42 An Overview of Demand Forecasting Modeling Techniques 44 Model Evaluation 46 Model Deployment 48 Forecasting Solution Acceptance 53 Use Case: Demand Forecasting 54 Conclusion 58 Chapter 3 Time Series Data Preparation 61 Python for Time Series Data 62 Common Data Preparation Operations for Time Series 65 Time stamps vs. Periods 66 Converting to Timestamps 69 Providing a Format Argument 70 Indexing 71 Time/Date Components 76 Frequency Conversion 78 Time Series Exploration and Understanding 79 How to Get Started with Time Series Data Analysis 79 Data Cleaning of Missing Values in the Time Series 84 Time Series Data Normalization and Standardization 86 Time Series Feature Engineering 89 Date Time Features 90 Lag Features and Window Features 92 Rolling Window Statistics 95 Expanding Window Statistics 97 Conclusion 98 Chapter 4 Introduction to Autoregressive and Automated Methods for Time Series Forecasting 101 Autoregression 102 Moving Average 119 Autoregressive Moving Average 120 Autoregressive Integrated Moving Average 122 Automated Machine Learning 129 Conclusion 136 Chapter 5 Introduction to Neural Networks for Time Series Forecasting 137 Reasons to Add Deep Learning to Your Time Series Toolkit 138 Deep Learning Neural Networks Are Capable of Automatically Learning and Extracting Features from Raw and Imperfect Data 140 Deep Learning Supports Multiple Inputs and Outputs 142 Recurrent Neural Networks Are Good at Extracting Patterns from Input Data 143 Recurrent Neural Networks for Time Series Forecasting 144 Recurrent Neural Networks 145 Long Short-Term Memory 147 Gated Recurrent Unit 148 How to Prepare Time Series Data for LSTMs and GRUs 150 How to Develop GRUs and LSTMs for Time Series Forecasting 154 Keras 155 TensorFlow 156 Univariate Models 156 Multivariate Models 160 Conclusion 164 Chapter 6 Model Deployment for Time Series Forecasting 167 Experimental Set Up and Introduction to Azure Machine Learning SDK for Python 168 Workspace 169 Experiment 169 Run 169 Model 170 Compute Target, RunConfiguration, and ScriptRun Config 171 Image and Webservice 172 Machine Learning Model Deployment 173 How to Select the Right Tools to Succeed with Model Deployment 175 Solution Architecture for Time Series Forecasting with Deployment Examples 177 Train and Deploy an ARIMA Model 179 Configure the Workspace 182 Create an Experiment 183 Create or Attach a Compute Cluster 184 Upload the Data to Azure 184 Create an Estimator 188 Submit the Job to the Remote Cluster 188 Register the Model 189 Deployment 189 Define Your Entry Script and Dependencies 190 Automatic Schema Generation 191 Conclusion 196 References 197 Index 199

    15 in stock

    £35.62

  • Smarter Data Science

    John Wiley & Sons Inc Smarter Data Science

    10 in stock

    Book SynopsisOrganizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments. When an organization manages its data effectively, its data science program becomes a fully scalaTable of ContentsForeword for Smarter Data Science xix Epigraph xxi Preamble xxiii Chapter 1 Climbing the AI Ladder 1 Readying Data for AI 2 Technology Focus Areas 3 Taking the Ladder Rung by Rung 4 Constantly Adapt to Retain Organizational Relevance 8 Data-Based Reasoning is Part and Parcel in the Modern Business 10 Toward the AI-Centric Organization 14 Summary 16 Chapter 2 Framing Part I: Considerations for Organizations Using AI 17 Data-Driven Decision-Making 18 Using Interrogatives to Gain Insight 19 The Trust Matrix 20 The Importance of Metrics and Human Insight 22 Democratizing Data and Data Science 23 Aye, a Prerequisite: Organizing Data Must Be a Forethought 26 Preventing Design Pitfalls 27 Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29 Quae Quaestio (Question Everything) 30 Summary 32 Chapter 3 Framing Part II: Considerations for Working with Data and AI 35 Personalizing the Data Experience for Every User 36 Context Counts: Choosing the Right Way to Display Data 38 Ethnography: Improving Understanding Through Specialized Data 42 Data Governance and Data Quality 43 The Value of Decomposing Data 43 Providing Structure Through Data Governance 43 Curating Data for Training 45 Additional Considerations for Creating Value 45 Ontologies: A Means for Encapsulating Knowledge 46 Fairness, Trust, and Transparency in AI Outcomes 49 Accessible, Accurate, Curated, and Organized 52 Summary 54 Chapter 4 A Look Back on Analytics: More Than One Hammer 57 Been Here Before: Reviewing the Enterprise Data Warehouse 57 Drawbacks of the Traditional Data Warehouse 64 Paradigm Shift 68 Modern Analytical Environments: The Data Lake 69 By Contrast 71 Indigenous Data 72 Attributes of Difference 73 Elements of the Data Lake 75 The New Normal: Big Data is Now Normal Data 77 Liberation from the Rigidity of a Single Data Model 78 Streaming Data 78 Suitable Tools for the Task 78 Easier Accessibility 79 Reducing Costs 79 Scalability 79 Data Management and Data Governance for AI 80 Schema-on-Read vs. Schema-on-Write 81 Summary 84 Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87 A Need for Organization 87 The Staging Zone 90 The Raw Zone 91 The Discovery and Exploration Zone 92 The Aligned Zone 93 The Harmonized Zone 98 The Curated Zone 100 Data Topologies 100 Zone Map 103 Data Pipelines 104 Data Topography 105 Expanding, Adding, Moving, and Removing Zones 107 Enabling the Zones 108 Ingestion 108 Data Governance 111 Data Storage and Retention 112 Data Processing 114 Data Access 116 Management and Monitoring 117 Metadata 118 Summary 119 Chapter 6 Addressing Operational Disciplines on the AI Ladder 121 A Passage of Time 122 Create 128 Stability 128 Barriers 129 Complexity 129 Execute 130 Ingestion 131 Visibility 132 Compliance 132 Operate 133 Quality 134 Reliance 135 Reusability 135 The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136 DevOps/MLOps 137 DataOps 139 AIOps 142 Summary 144 Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147 Toward a Value Chain 148 Chaining Through Correlation 152 Enabling Action 154 Expanding the Means to Act 155 Curation 156 Data Governance 159 Integrated Data Management 162 Onboarding 163 Organizing 164 Cataloging 166 Metadata 167 Preparing 168 Provisioning 169 Multi-Tenancy 170 Summary 173 Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175 Deriving Value: Managing Data as an Asset 175 An Inexact Science 180 Accessibility to Data: Not All Users are Equal 183 Providing Self-Service to Data 184 Access: The Importance of Adding Controls 186 Ranking Datasets Using a Bottom-Up Approach for Data Governance 187 How Various Industries Use Data and AI 188 Benefi ting from Statistics 189 Summary 198 Chapter 9 Constructing for the Long-Term 199 The Need to Change Habits: Avoiding Hard-Coding 200 Overloading 201 Locked In 202 Ownership and Decomposition 204 Design to Avoid Change 204 Extending the Value of Data Through AI 206 Polyglot Persistence 208 Benefi ting from Data Literacy 213 Understanding a Topic 215 Skillsets 216 It’s All Metadata 218 The Right Data, in the Right Context, with the Right Interface 219 Summary 221 Chapter 10 A Journey’s End: An IA for AI 223 Development Efforts for AI 224 Essential Elements: Cloud-Based Computing, Data, and Analytics 228 Intersections: Compute Capacity and Storage Capacity 234 Analytic Intensity 237 Interoperability Across the Elements 238 Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242 Data Management for the Data Puddle, Data Pond, and Data Lake 243 Driving Action: Context, Content, and Decision-Makers 245 Keep It Simple 248 The Silo is Dead; Long Live the Silo 250 Taxonomy: Organizing Data Zones 252 Capabilities for an Open Platform 256 Summary 260 Appendix Glossary of Terms 263 Index 269

    10 in stock

    £28.49

  • Foundations of Data Intensive Applications

    John Wiley & Sons Inc Foundations of Data Intensive Applications

    10 in stock

    Book SynopsisPEEK UNDER THE HOOD OF BIG DATA ANALYTICS The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance. The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within. Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system. Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to: Identify the foundations of large-scale, distributed data processing systemsMake major software design decisions that optimize performanceDiagnose performance problems and distributed operation issuesUnderstand state-of-the-art research in big dataExplain and use the major big data frameworks and understand what underpins themUse big data analytics in the real world to solve practical problemsTable of ContentsIntroduction xxvii Chapter 1 Data Intensive Applications 1 Anatomy of a Data-Intensive Application 1 A Histogram Example 2 Program 2 Process Management 3 Communication 4 Execution 5 Data Structures 6 Putting It Together 6 Application 6 Resource Management 6 Messaging 7 Data Structures 7 Tasks and Execution 8 Fault Tolerance 8 Remote Execution 8 Parallel Applications 9 Serial Applications 9 Lloyd’s K-Means Algorithm 9 Parallelizing Algorithms 11 Decomposition 11 Task Assignment 12 Orchestration 12 Mapping 13 K-Means Algorithm 13 Parallel and Distributed Computing 15 Memory Abstractions 16 Shared Memory 16 Distributed Memory 18 Hybrid (Shared + Distributed) Memory 20 Partitioned Global Address Space Memory 21 Application Classes and Frameworks 22 Parallel Interaction Patterns 22 Pleasingly Parallel 23 Dataflow 23 Iterative 23 Irregular 23 Data Abstractions 24 Data-Intensive Frameworks 24 Components 24 Workflows 25 An Example 25 What Makes It Difficult? 26 Developing Applications 27 Concurrency 27 Data Partitioning 28 Debugging 28 Diverse Environments 28 Computer Networks 29 Synchronization 29 Thread Synchronization 29 Data Synchronization 30 Ordering of Events 31 Faults 31 Consensus 31 Summary 32 References 32 Chapter 2 Data and Storage 35 Storage Systems 35 Storage for Distributed Systems 36 Direct-Attached Storage 37 Storage Area Network 37 Network-Attached Storage 38 DAS or SAN or NAS? 38 Storage Abstractions 39 Block Storage 39 File Systems 40 Object Storage 41 Data Formats 41 XML 42 JSON 43 CSV 44 Apache Parquet 45 Apache Avro 47 Avro Data Definitions (Schema) 48 Code Generation 49 Without Code Generation 49 Avro File 49 Schema Evolution 49 Protocol Buffers, Flat Buffers, and Thrift 50 Data Replication 51 Synchronous and Asynchronous Replication 52 Single-Leader and Multileader Replication 52 Data Locality 53 Disadvantages of Replication 54 Data Partitioning 54 Vertical Partitioning 55 Horizontal Partitioning (Sharding) 55 Hybrid Partitioning 56 Considerations for Partitioning 57 NoSQL Databases 58 Data Models 58 Key-Value Databases 58 Document Databases 59 Wide Column Databases 59 Graph Databases 59 CAP Theorem 60 Message Queuing 61 Message Processing Guarantees 63 Durability of Messages 64 Acknowledgments 64 Storage First Brokers and Transient Brokers 65 Summary 66 References 66 Chapter 3 Computing Resources 69 A Demonstration 71 Computer Clusters 72 Anatomy of a Computer Cluster 73 Data Analytics in Clusters 74 Dedicated Clusters 76 Classic Parallel Systems 76 Big Data Systems 77 Shared Clusters 79 OpenMPI on a Slurm Cluster 79 Spark on a Yarn Cluster 80 Distributed Application Life Cycle 80 Life Cycle Steps 80 Step 1: Preparation of the Job Package 81 Step 2: Resource Acquisition 81 Step 3: Distributing the Application (Job) Artifacts 81 Step 4: Bootstrapping the Distributed Environment 82 Step 5: Monitoring 82 Step 6: Termination 83 Computing Resources 83 Data Centers 83 Physical Machines 85 Network 85 Virtual Machines 87 Containers 87 Processor, Random Access Memory, and Cache 88 Cache 89 Multiple Processors in a Computer 90 Nonuniform Memory Access 90 Uniform Memory Access 91 Hard Disk 92 GPUs 92 Mapping Resources to Applications 92 Cluster Resource Managers 93 Kubernetes 94 Kubernetes Architecture 94 Kubernetes Application Concepts 96 Data-Intensive Applications on Kubernetes 96 Slurm 98 Yarn 99 Job Scheduling 99 Scheduling Policy 101 Objective Functions 101 Throughput and Latency 101 Priorities 102 Lowering Distance Among the Processes 102 Data Locality 102 Completion Deadline 102 Algorithms 103 First in First Out 103 Gang Scheduling 103 List Scheduling 103 Backfill Scheduling 104 Summary 104 References 104 Chapter 4 Data Structures 107 Virtual Memory 108 Paging and TLB 109 Cache 111 The Need for Data Structures 112 Cache and Memory Layout 112 Memory Fragmentation 114 Data Transfer 115 Data Transfer Between Frameworks 115 Cross-Language Data Transfer 115 Object and Text Data 116 Serialization 116 Vectors and Matrices 117 1D Vectors 118 Matrices 118 Row-Major and Column-Major Formats 119 N-Dimensional Arrays/Tensors 122 NumPy 123 Memory Representation 125 K-means with NumPy 126 Sparse Matrices 127 Table 128 Table Formats 129 Column Data Format 129 Row Data Format 130 Apache Arrow 130 Arrow Data Format 131 Primitive Types 131 Variable-Length Data 132 Arrow Serialization 133 Arrow Example 133 Pandas DataFrame 134 Column vs. Row Tables 136 Summary 136 References 136 Chapter 5 Programming Models 139 Introduction 139 Parallel Programming Models 140 Parallel Process Interaction 140 Problem Decomposition 140 Data Structures 140 Data Structures and Operations 141 Data Types 141 Local Operations 143 Distributed Operations 143 Array 144 Tensor 145 Indexing 145 Slicing 146 Broadcasting 146 Table 146 Graph Data 148 Message Passing Model 150 Model 151 Message Passing Frameworks 151 Message Passing Interface 151 Bulk Synchronous Parallel 153 K-Means 154 Distributed Data Model 157 Eager Model 157 Dataflow Model 158 Data Frames, Datasets, and Tables 159 Input and Output 160 Task Graphs (Dataflow Graphs) 160 Model 161 User Program to Task Graph 161 Tasks and Functions 162 Source Task 162 Compute Task 163 Implicit vs. Explicit Parallel Models 163 Remote Execution 163 Components 164 Batch Dataflow 165 Data Abstractions 165 Table Abstraction 165 Matrix/Tensors 165 Functions 166 Source 166 Compute 167 Sink 168 An Example 168 Caching State 169 Evaluation Strategy 170 Lazy Evaluation 171 Eager Evaluation 171 Iterative Computations 172 DOALL Parallel 172 DOACROSS Parallel 172 Pipeline Parallel 173 Task Graph Models for Iterative Computations 173 K-Means Algorithm 174 Streaming Dataflow 176 Data Abstractions 177 Streams 177 Distributed Operations 178 Streaming Functions 178 Sources 178 Compute 179 Sink 179 An Example 179 Windowing 180 Windowing Strategies 181 Operations on Windows 182 Handling Late Events 182 SQL 182 Queries 183 Summary 184 References 184 Chapter 6 Messaging 187 Network Services 188 TCP/IP 188 RDMA 189 Messaging for Data Analytics 189 Anatomy of a Message 190 Data Packing 190 Protocol 191 Message Types 192 Control Messages 192 External Data Sources 192 Data Transfer Messages 192 Distributed Operations 194 How Are They Used? 194 Task Graph 194 Parallel Processes 195 Anatomy of a Distributed Operation 198 Data Abstractions 198 Distributed Operation API 198 Streaming and Batch Operations 199 Streaming Operations 199 Batch Operations 199 Distributed Operations on Arrays 200 Broadcast 200 Reduce and AllReduce 201 Gather and AllGather 202 Scatter 203 AllToAll 204 Optimized Operations 204 Broadcast 205 Reduce 206 AllReduce 206 Gather and AllGather Collective Algorithms 208 Scatter and AllToAll Collective Algorithms 208 Distributed Operations on Tables 209 Shuffle 209 Partitioning Data 211 Handling Large Data 212 Fetch-Based Algorithm (Asynchronous Algorithm) 213 Distributed Synchronization Algorithm 214 GroupBy 214 Aggregate 215 Join 216 Join Algorithms 219 Distributed Joins 221 Performance of Joins 223 More Operations 223 Advanced Topics 224 Data Packing 224 Memory Considerations 224 Message Coalescing 224 Compression 225 Stragglers 225 Nonblocking vs. Blocking Operations 225 Blocking Operations 226 Nonblocking Operations 226 Summary 227 References 227 Chapter 7 Parallel Tasks 229 CPUs 229 Cache 229 False Sharing 230 Vectorization 231 Threads and Processes 234 Concurrency and Parallelism 234 Context Switches and Scheduling 234 Mutual Exclusion 235 User-Level Threads 236 Process Affinity 236 NUMA-Aware Programming 237 Accelerators 237 Task Execution 238 Scheduling 240 Static Scheduling 240 Dynamic Scheduling 240 Loosely Synchronous and Asynchronous Execution 241 Loosely Synchronous Parallel System 242 Asynchronous Parallel System (Fully Distributed) 243 Actor Model 244 Actor 244 Asynchronous Messages 244 Actor Frameworks 245 Execution Models 245 Process Model 246 Thread Model 246 Remote Execution 246 Tasks for Data Analytics 248 SPMD and MPMD Execution 248 Batch Tasks 249 Data Partitions 249 Operations 251 Task Graph Scheduling 253 Threads, CPU Cores, and Partitions 254 Data Locality 255 Execution 257 Streaming Execution 257 State 257 Immutable Data 258 State in Driver 258 Distributed State 259 Streaming Tasks 259 Streams and Data Partitioning 260 Partitions 260 Operations 261 Scheduling 262 Uniform Resources 263 Resource-Aware Scheduling 264 Execution 264 Dynamic Scaling 264 Back Pressure (Flow Control) 265 Rate-Based Flow Control 266 Credit-Based Flow Control 266 State 267 Summary 268 References 268 Chapter 8 Case Studies 271 Apache Hadoop 271 Programming Model 272 Architecture 274 Cluster Resource Management 275 Apache Spark 275 Programming Model 275 RDD API 276 SQL, DataFrames, and DataSets 277 Architecture 278 Resource Managers 278 Task Schedulers 279 Executors 279 Communication Operations 280 Apache Spark Streaming 280 Apache Storm 282 Programming Model 282 Architecture 284 Cluster Resource Managers 285 Communication Operations 286 Kafka Streams 286 Programming Model 286 Architecture 287 PyTorch 288 Programming Model 288 Execution 292 Cylon 295 Programming Model 296 Architecture 296 Execution 297 Communication Operations 298 Rapids cuDF 298 Programming Model 298 Architecture 299 Summary 300 References 300 Chapter 9 Fault Tolerance 303 Dependable Systems and Failures 303 Fault Tolerance is Not Free 304 Dependable Systems 305 Failures 306 Process Failures 306 Network Failures 307 Node Failures 307 Byzantine Faults 307 Failure Models 308 Failure Detection 308 Recovering from Faults 309 Recovery Methods 310 Stateless Programs 310 Batch Systems 311 Streaming Systems 311 Processing Guarantees 311 Role of Cluster Resource Managers 312 Checkpointing 313 State 313 Consistent Global State 313 Uncoordinated Checkpointing 314 Coordinated Checkpointing 315 Chandy-Lamport Algorithm 315 Batch Systems 316 When to Checkpoint? 317 Snapshot Data 318 Streaming Systems 319 Case Study: Apache Storm 319 Message Tracking 320 Failure Recovery 321 Case Study: Apache Flink 321 Checkpointing 322 Failure Recovery 324 Batch Systems 324 Iterative Programs 324 Case Study: Apache Spark 325 RDD Recomputing 326 Checkpointing 326 Recovery from Failures 327 Summary 327 References 327 Chapter 10 Performance and Productivity 329 Performance Metrics 329 System Performance Metrics 330 Parallel Performance Metrics 330 Speedup 330 Strong Scaling 331 Weak Scaling 332 Parallel Efficiency 332 Amdahl’s Law 333 Gustafson’s Law 334 Throughput 334 Latency 335 Benchmarks 336 LINPACK Benchmark 336 NAS Parallel Benchmark 336 BigDataBench 336 TPC Benchmarks 337 HiBench 337 Performance Factors 337 Memory 337 Execution 338 Distributed Operators 338 Disk I/O 339 Garbage Collection 339 Finding Issues 342 Serial Programs 342 Profiling 342 Scaling 343 Strong Scaling 343 Weak Scaling 344 Debugging Distributed Applications 344 Programming Languages 345 C/C++ 346 Java 346 Memory Management 347 Data Structures 348 Interfacing with Python 348 Python 350 C/C++ Code integration 350 Productivity 351 Choice of Frameworks 351 Operating Environment 353 CPUs and GPUs 353 Public Clouds 355 Future of Data-Intensive Applications 358 Summary 358 References 359 Index 361

    10 in stock

    £36.12

  • Becoming a Data Head

    John Wiley & Sons Inc Becoming a Data Head

    15 in stock

    Book SynopsisTable of ContentsAcknowledgments xiii Foreword xxiii Introduction xxvii Part One Thinking Like a Data Head Chapter 1 What Is the Problem? 3 Questions a Data Head Should Ask 4 Why Is This Problem Important? 4 Who Does This Problem Affect? 6 What If We Don’t Have the Right Data? 6 When Is the Project Over? 7 What If We Don’t Like the Results? 7 Understanding Why Data Projects Fail 8 Customer Perception 8 Discussion 10 Working on Problems That Matter 11 Chapter Summary 11 Chapter 2 What Is Data? 13 Data vs. Information 13 An Example Dataset 14 Data Types 15 How Data Is Collected and Structured 16 Observational vs. Experimental Data 16 Structured vs. Unstructured Data 17 Basic Summary Statistics 18 Chapter Summary 19 Chapter 3 Prepare to Think Statistically 21 Ask Questions 22 There Is Variation in All Things 23 Scenario: Customer Perception (The Sequel) 24 Case Study: Kidney-Cancer Rates 26 Probabilities and Statistics 28 Probability vs. Intuition 29 Discovery with Statistics 31 Chapter Summary 33 Part Two Speaking Like a Data Head Chapter 4 Argue with the Data 37 What Would You Do? 38 Missing Data Disaster 39 Tell Me the Data Origin Story 43 Who Collected the Data? 44 How Was the Data Collected? 44 Is the Data Representative? 45 Is There Sampling Bias? 46 What Did You Do with Outliers? 46 What Data Am I Not Seeing? 47 How Did You Deal with Missing Values? 47 Can the Data Measure What You Want It to Measure? 48 Argue with Data of All Sizes 48 Chapter Summary 49 Chapter 5 Explore the Data 51 Exploratory Data Analysis and You 52 Embracing the Exploratory Mindset 52 Questions to Guide You 53 The Setup 53 Can the Data Answer the Question? 54 Set Expectations and Use Common Sense 54 Do the Values Make Intuitive Sense? 54 Watch Out: Outliers and Missing Values 58 Did You Discover Any Relationships? 59 Understanding Correlation 59 Watch Out: Misinterpreting Correlation 60 Watch Out: Correlation Does Not Imply Causation 62 Did You Find New Opportunities in the Data? 63 Chapter Summary 63 Chapter 6 Examine the Probabilities 65 Take a Guess 66 The Rules of the Game 66 Notation 67 Conditional Probability and Independent Events 69 The Probability of Multiple Events 69 Two Things That Happen Together 69 One Thing or the Other 70 Probability Thought Exercise 72 Next Steps 73 Be Careful Assuming Independence 74 Don’t Fall for the Gambler’s Fallacy 74 All Probabilities Are Conditional 75 Don’t Swap Dependencies 76 Bayes’ Theorem 76 Ensure the Probabilities Have Meaning 79 Calibration 80 Rare Events Can, and Do, Happen 80 Chapter Summary 81 Chapter 7 Challenge the Statistics 83 Quick Lessons on Inference 83 Give Yourself Some Wiggle Room 84 More Data, More Evidence 84 Challenge the Status Quo 85 Evidence to the Contrary 86 Balance Decision Errors 88 The Process of Statistical Inference 89 The Questions You Should Ask to Challenge the Statistics 90 What Is the Context for These Statistics? 90 What Is the Sample Size? 91 What Are You Testing? 92 What Is the Null Hypothesis? 92 Assuming Equivalence 93 What Is the Significance Level? 93 How Many Tests Are You Doing? 94 Can I See the Confidence Intervals? 95 Is This Practically Significant? 96 Are You Assuming Causality? 96 Chapter Summary 97 Part Three Understanding the Data Scientist’s Toolbox Chapter 8 Search for Hidden Groups 101 Unsupervised Learning 102 Dimensionality Reduction 102 Creating Composite Features 103 Principal Component Analysis 105 Principal Components in Athletic Ability 105 PCA Summary 108 Potential Traps 109 Clustering 110 k-Means Clustering 111 Clustering Retail Locations 111 Potential Traps 113 Chapter Summary 114 Chapter 9 Understand the Regression Model 117 Supervised Learning 117 Linear Regression: What It Does 119 Least Squares Regression: Not Just a Clever Name 120 Linear Regression: What It Gives You 123 Extending to Many Features 124 Linear Regression: What Confusion It Causes 125 Omitted Variables 125 Multicollinearity 126 Data Leakage 127 Extrapolation Failures 128 Many Relationships Aren’t Linear 128 Are You Explaining or Predicting? 128 Regression Performance 130 Other Regression Models 131 Chapter Summary 131 Chapter 10 Understand the Classification Model 133 Introduction to Classification 133 What You’ll Learn 134 Classification Problem Setup 135 Logistic Regression 135 Logistic Regression: So What? 138 Decision Trees 139 Ensemble Methods 142 Random Forests 143 Gradient Boosted Trees 143 Interpretability of Ensemble Models 145 Watch Out for Pitfalls 145 Misapplication of the Problem 146 Data Leakage 146 Not Splitting Your Data 146 Choosing the Right Decision Threshold 147 Misunderstanding Accuracy 147 Confusion Matrices 148 Chapter Summary 150 Chapter 11 Understand Text Analytics 151 Expectations of Text Analytics 151 How Text Becomes Numbers 153 A Big Bag of Words 153 N-Grams 157 Word Embeddings 158 Topic Modeling 160 Text Classification 163 Naïve Bayes 164 Sentiment Analysis 166 Practical Considerations When Working with Text 167 Big Tech Has the Upper Hand 168 Chapter Summary 169 Chapter 12 Conceptualize Deep Learning 171 Neural Networks 172 How Are Neural Networks Like the Brain? 172 A Simple Neural Network 173 How a Neural Network Learns 174 A Slightly More Complex Neural Network 175 Applications of Deep Learning 178 The Benefits of Deep Learning 179 How Computers “See” Images 180 Convolutional Neural Networks 182 Deep Learning on Language and Sequences 183 Deep Learning in Practice 185 Do You Have Data? 185 Is Your Data Structured? 186 What Will the Network Look Like? 186 Artificial Intelligence and You 187 Big Tech Has the Upper Hand 188 Ethics in Deep Learning 189 Chapter Summary 190 Part Four Ensuring Success Chapter 13 Watch Out for Pitfalls 193 Biases and Weird Phenomena in Data 194 Survivorship Bias 194 Regression to the Mean 195 Simpson’s Paradox 195 Confirmation Bias 197 Effort Bias (aka the “Sunk Cost Fallacy”) 197 Algorithmic Bias 198 Uncategorized Bias 198 The Big List of Pitfalls 199 Statistical and Machine Learning Pitfalls 199 Project Pitfalls 200 Chapter Summary 202 Chapter 14 Know the People and Personalities 203 Seven Scenes of Communication Breakdowns 204 The Postmortem 204 Storytime 205 The Telephone Game 206 Into the Weeds 206 The Reality Check 207 The Takeover 207 The Blowhard 208 Data Personalities 208 Data Enthusiasts 209 Data Cynics 209 Data Heads 209 Chapter Summary 210 Chapter 15 What’s Next? 211 Index 215

    15 in stock

    £26.40

  • Responsible Data Science

    John Wiley & Sons Inc Responsible Data Science

    5 in stock

    Book SynopsisExplore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of Black box algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk Table of ContentsIntroduction xix Part I Motivation for Ethical Data Science and Background Knowledge 1 Chapter 1 Responsible Data Science 3 The Optum Disaster 4 Jekyll and Hyde 5 Eugenics 7 Galton, Pearson, and Fisher 7 Ties between Eugenics and Statistics 7 Ethical Problems in Data Science Today 9 Predictive Models 10 From Explaining to Predicting 10 Predictive Modeling 11 Setting the Stage for Ethical Issues to Arise 12 Classic Statistical Models 12 Black-Box Methods 14 Important Concepts in Predictive Modeling 19 Feature Selection 19 Model-Centric vs. Data-Centric Models 20 Holdout Sample and Cross-Validation 20 Overfitting 21 Unsupervised Learning 22 The Ethical Challenge of Black Boxes 23 Two Opposing Forces 24 Pressure for More Powerful AI 24 Public Resistance and Anxiety 24 Summary 25 Chapter 2 Background: Modeling and the Black-Box Algorithm 27 Assessing Model Performance 27 Predicting Class Membership 28 The Rare Class Problem 28 Lift and Gains 28 Area Under the Curve 29 AUC vs. Lift (Gains) 31 Predicting Numeric Values 32 Goodness-of-Fit 32 Holdout Sets and Cross-Validation 33 Optimization and Loss Functions 34 Intrinsically Interpretable Models vs. Black-Box Models 35 Ethical Challenges with Interpretable Models 38 Black-Box Models 39 Ensembles 39 Nearest Neighbors 41 Clustering 41 Association Rules 42 Collaborative Filters 42 Artificial Neural Nets and Deep Neural Nets 43 Problems with Black-Box Predictive Models 45 Problems with Unsupervised Algorithms 47 Summary 48 Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49 AI and Intentional Consequences by Design 50 Deepfakes 50 Supporting State Surveillance and Suppression 51 Behavioral Manipulation 52 Automated Testing to Fine-Tune Targeting 53 AI and Unintended Consequences 55 Healthcare 56 Finance 57 Law Enforcement 58 Technology 60 The Legal and Regulatory Landscape around AI 61 Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63 A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64 Trends in Emerging Law and Policy Related to AI 66 Summary 69 Part II The Ethical Data Science Process 71 Chapter 4 The Responsible Data Science Framework 73 Why We Keep Building Harmful AI 74 Misguided Need for Cutting-Edge Models 74 Excessive Focus on Predictive Performance 74 Ease of Access and the Curse of Simplicity 76 The Common Cause 76 The Face Thieves 78 An Anatomy of Modeling Harms 79 The World: Context Matters for Modeling 80 The Data: Representation Is Everything 83 The Model: Garbage In, Danger Out 85 Model Interpretability: Human Understanding for Superhuman Models 86 Efforts Toward a More Responsible Data Science 89 Principles Are the Focus 90 Nonmaleficence 90 Fairness 90 Transparency 91 Accountability 91 Privacy 92 Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92 Justification 94 Compilation 94 Preparation 95 Modeling 96 Auditing 96 Summary 97 Chapter 5 Model Interpretability: The What and the Why 99 The Sexist Résumé Screener 99 The Necessity of Model Interpretability 101 Connections Between Predictive Performance and Interpretability 103 Uniting (High) Model Performance and Model Interpretability 105 Categories of Interpretability Methods 107 Global Methods 107 Local Methods 113 Real-World Successes of Interpretability Methods 113 Facilitating Debugging and Audit 114 Leveraging the Improved Performance of Black-Box Models 116 Acquiring New Knowledge 116 Addressing Critiques of Interpretability Methods 117 Explanations Generated by Interpretability Methods Are Not Robust 118 Explanations Generated by Interpretability Methods Are Low Fidelity 120 The Forking Paths of Model Interpretability 121 The Four-Measure Baseline 122 Building Our Own Credit Scoring Model 124 Using Train-Test Splits 125 Feature Selection and Feature Engineering 125 Baseline Models 127 The Importance of Making Your Code Work for Everyone 129 Execution Variability 129 Addressing Execution Variability with Functionalized Code 130 Stochastic Variability 130 Addressing Stochastic Variability via Resampling 130 Summary 133 Part III EDS in Practice 135 Chapter 6 Beginning a Responsible Data Science Project 137 How the Responsible Data Science Framework Addresses the Common Cause 138 Datasets Used 140 Regression Datasets—Communities and Crime 140 Classification Datasets—COMPAS 140 Common Elements Across Our Analyses 141 Project Structure and Documentation 141 Project Structure for the Responsible Data Science Framework: Everything in Its Place 142 Documentation: The Responsible Thing to Do 145 Beginning a Responsible Data Science Project 151 Communities and Crime (Regression) 151 Justification 151 Compilation 154 Identifying Protected Classes 157 Preparation—Data Splitting and Feature Engineering 159 Datasheets 161 COMPAS (Classification) 164 Justification 164 Compilation 166 Identifying Protected Classes 168 Preparation 169 Summary 172 Chapter 7 Auditing a Responsible Data Science Project 173 Fairness and Data Science in Practice 175 The Many Different Conceptions of Fairness 175 Different Forms of Fairness Are Trade-Offs with Each Other 177 Quantifying Predictive Fairness Within a Data Science Project 179 Mitigating Bias to Improve Fairness 185 Preprocessing 185 In-processing 186 Postprocessing 186 Classification Example: COMPAS 187 Prework: Code Practices, Modeling, and Auditing 187 Justification, Compilation, and Preparation Review 189 Modeling 191 Auditing 200 Per-Group Metrics: Overall 200 Per-Group Metrics: Error 202 Fairness Metrics 204 Interpreting Our Models: Why Are They Unfair? 207 Analysis for Different Groups 209 Bias Mitigation 214 Preprocessing: Oversampling 214 Postprocessing: Optimizing Thresholds Automatically 218 Postprocessing: Optimizing Thresholds Manually 219 Summary 223 Chapter 8 Auditing for Neural Networks 225 Why Neural Networks Merit Their Own Chapter 227 Neural Networks Vary Greatly in Structure 227 Neural Networks Treat Features Differently 229 Neural Networks Repeat Themselves 231 A More Impenetrable Black Box 232 Baseline Methods 233 Representation Methods 233 Distillation Methods 234 Intrinsic Methods 235 Beginning a Responsible Neural Network Project 236 Justification 236 Moving Forward 239 Compilation 239 Tracking Experiments 241 Preparation 244 Modeling 245 Auditing 247 Per-Group Metrics: Overall 247 Per-Group Metrics: Unusual Definitions of “False Positive” 248 Fairness Metrics 249 Interpreting Our Models: Why Are They Unfair? 252 Bias Mitigation 253 Wrap-Up 255 Auditing Neural Networks for Natural Language Processing 258 Identifying and Addressing Sources of Bias in NLP 258 The Real World 259 Data 260 Models 261 Model Interpretability 262 Summary 262 Chapter 9 Conclusion 265 How Can We Do Better? 267 The Responsible Data Science Framework 267 Doing Better As Managers 269 Doing Better As Practitioners 270 A Better Future If We Can Keep It 271 Index 273

    5 in stock

    £24.79

  • Machine Learning for Business Analytics

    John Wiley & Sons Inc Machine Learning for Business Analytics

    15 in stock

    Book SynopsisMachine Learning for Business Analytics Machine learningalso known as data mining or data analyticsis a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information. Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques. This is the seventh edition of Machine Learning for Business Analytics, and the first using RapidMiner software. This edition also includes: ATable of ContentsForeword by Ravi Bapna xxi Preface to the RapidMiner Edition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Machine Learning? 5 1.3 Machine Learning, AI, and Related Terms 5 1.4 Big Data 7 1.5 Data Science 8 1.6 Why Are There So Many Different Methods? 9 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 12 1.9 Using RapidMiner Studio 14 CHAPTER 2 Overview of the Machine Learning Process 19 2.1 Introduction 19 2.2 Core Ideas in Machine Learning 20 2.3 The Steps in a Machine Learning Project 23 2.4 Preliminary Steps 25 2.5 Predictive Power and Overfitting 32 2.6 Building a Predictive Model with RapidMiner 37 2.7 Using RapidMiner for Machine Learning 45 2.8 Automating Machine Learning Solutions 47 2.9 Ethical Practice in Machine Learning 52 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 63 3.1 Introduction 63 3.2 Data Examples 65 3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66 3.4 Multidimensional Visualization 75 3.5 Specialized Visualizations 87 3.6 Summary: Major Visualizations and Operations, by Machine Learning Goal 92 CHAPTER 4 Dimension Reduction 97 4.1 Introduction 97 4.2 Curse of Dimensionality 98 4.3 Practical Considerations 98 4.4 Data Summaries 100 4.5 Correlation Analysis 103 4.6 Reducing the Number of Categories in Categorical Attributes 105 4.7 Converting a Categorical Attribute to a Numerical Attribute 107 4.8 Principal Component Analysis 107 4.9 Dimension Reduction Using Regression Models 117 4.10 Dimension Reduction Using Classification and Regression Trees 119 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 125 5.1 Introduction 125 5.2 Evaluating Predictive Performance 126 5.3 Judging Classifier Performance 131 5.4 Judging Ranking Performance 146 5.5 Oversampling 151 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 163 6.1 Introduction 163 6.2 Explanatory vs. Predictive Modeling 164 6.3 Estimating the Regression Equation and Prediction 166 6.4 Variable Selection in Linear Regression 171 CHAPTER 7 k-Nearest Neighbors (k-NN) 189 7.1 The k-NN Classifier (Categorical Label) 189 7.2 k-NN for a Numerical Label 200 7.3 Advantages and Shortcomings of k-NN Algorithms 202 CHAPTER 8 The Naive Bayes Classifier 209 8.1 Introduction 209 8.2 Applying the Full (Exact) Bayesian Classifier 211 8.3 Solution: Naive Bayes 213 8.4 Advantages and Shortcomings of the Naive Bayes Classifier 223 CHAPTER 9 Classification and Regression Trees 229 9.1 Introduction 229 9.2 Classification Trees 232 9.3 Evaluating the Performance of a Classification Tree 240 9.4 Avoiding Overfitting 245 9.5 Classification Rules from Trees 255 9.6 Classification Trees for More Than Two Classes 256 9.7 Regression Trees 256 9.8 Improving Prediction: Random Forests and Boosted Trees 259 9.9 Advantages and Weaknesses of a Tree 261 CHAPTER 10 Logistic Regression 269 10.1 Introduction 269 10.2 The Logistic Regression Model 271 10.3 Example: Acceptance of Personal Loan 272 10.4 Logistic Regression for Multi-class Classification 283 10.5 Example of Complete Analysis: Predicting Delayed Flights 286 CHAPTER 11 Neural Networks 305 11.1 Introduction 306 11.2 Concept and Structure of a Neural Network 306 11.3 Fitting a Network to Data 307 11.4 Required User Input 321 11.5 Exploring the Relationship Between Predictors and Target Attribute 322 11.6 Deep Learning 323 11.7 Advantages and Weaknesses of Neural Networks 334 CHAPTER 12 Discriminant Analysis 337 12.1 Introduction 337 12.2 Distance of a Record from a Class 340 12.3 Fisher’s Linear Classification Functions 341 12.4 Classification Performance of Discriminant Analysis 346 12.5 Prior Probabilities 348 12.6 Unequal Misclassification Costs 348 12.7 Classifying More Than Two Classes 349 12.8 Advantages and Weaknesses 351 CHAPTER 13 Generating, Comparing, and Combining Multiple Models 359 13.1 Automated Machine Learning (AutoML) 359 13.2 Explaining Model Predictions 367 13.3 Ensembles 373 13.4 Summary 381 PART V INTERVENTION AND USER FEEDBACK CHAPTER 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 387 14.1 A/B Testing 387 14.2 Uplift (Persuasion) Modeling 393 14.3 Reinforcement Learning 400 14.4 Summary 405 PART VI MINING RELATIONSHIPS AMONG RECORDS CHAPTER 15 Association Rules and Collaborative Filtering 409 15.1 Association Rules 409 15.2 Collaborative Filtering 424 15.3 Summary 438 CHAPTER 16 Cluster Analysis 445 16.1 Introduction 445 16.2 Measuring Distance Between Two Records 449 16.3 Measuring Distance Between Two Clusters 455 16.4 Hierarchical (Agglomerative) Clustering 457 16.5 Non-Hierarchical Clustering: The k-Means Algorithm 466 PART VII FORECASTING TIME SERIES CHAPTER 17 Handling Time Series 479 17.1 Introduction 480 17.2 Descriptive vs. Predictive Modeling 481 17.3 Popular Forecasting Methods in Business 481 17.4 Time Series Components 482 17.5 Data Partitioning and Performance Evaluation 486 CHAPTER 18 Regression-Based Forecasting 497 18.1 A Model with Trend 498 18.2 A Model with Seasonality 504 18.3 A Model with Trend and Seasonality 508 18.4 Autocorrelation and ARIMA Models 509 CHAPTER 19 Smoothing and Deep Learning Methods for Forecasting 533 19.1 Smoothing Methods: Introduction 534 19.2 Moving Average 534 19.3 Simple Exponential Smoothing 541 19.4 Advanced Exponential Smoothing 545 19.5 Deep Learning for Forecasting 549 PART VIII DATA ANALYTICS CHAPTER 20 Social Network Analytics 563 20.1 Introduction 563 20.2 Directed vs. Undirected Networks 564 20.3 Visualizing and Analyzing Networks 567 20.4 Social Data Metrics and Taxonomy 571 20.5 Using Network Metrics in Prediction and Classification 577 20.6 Collecting Social Network Data with RapidMiner 584 20.7 Advantages and Disadvantages 584 CHAPTER 21 Text Mining 589 21.1 Introduction 589 21.2 The Tabular Representation of Text: Term–Document Matrix and “Bag-of-Words’’ 590 21.3 Bag-of-Words vs. Meaning Extraction at Document Level 592 21.4 Preprocessing the Text 593 21.5 Implementing Machine Learning Methods 602 21.6 Example: Online Discussions on Autos and Electronics 602 21.7 Example: Sentiment Analysis of Movie Reviews 607 21.8 Summary 614 CHAPTER 22 Responsible Data Science 617 22.1 Introduction 617 22.2 Unintentional Harm 618 22.3 Legal Considerations 620 22.4 Principles of Responsible Data Science 621 22.5 A Responsible Data Science Framework 624 22.6 Documentation Tools 628 22.7 Example: Applying the RDS Framework to the COMPAS Example 631 22.8 Summary 641 PART IX CASES CHAPTER 23 Cases 647 23.1 Charles Book Club 647 23.2 German Credit 653 23.3 Tayko Software Cataloger 658 23.4 Political Persuasion 662 23.5 Taxi Cancellations 665 23.6 Segmenting Consumers of Bath Soap 667 23.7 Direct-Mail Fundraising 670 23.8 Catalog Cross-Selling 672 23.9 Time Series Case: Forecasting Public Transportation Demand 673 23.10 Loan Approval 675 Index 685

    15 in stock

    £96.30

  • Machine Learning for Business Analytics

    John Wiley & Sons Inc Machine Learning for Business Analytics

    15 in stock

    Book SynopsisTable of ContentsForeword xix Preface to the Fourth Edition xxi Acknowledgments xxv PART I PRELIMINARIES CHAPTER 1 Introduction 3 CHAPTER 2 Overview of the Machine Learning Process 15 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 59 CHAPTER 4 Dimension Reduction 91 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 115 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 151 CHAPTER 7 k-Nearest-Neighbors (k-NN) 169 CHAPTER 8 The Naive Bayes Classifier 181 CHAPTER 9 Classification and Regression Trees 197 CHAPTER 10 Logistic Regression 229 CHAPTER 11 Neural Nets 257 CHAPTER 12 Discriminant Analysis 283 CHAPTER 13 Generating, Comparing, and Combining Multiple Models 303 PART V INTERVENTION AND USER FEEDBACK CHAPTER 14 Experiments, Uplift Modeling, and Reinforcement Learning 319 PART VI MINING RELATIONSHIPS AMONG RECORDS CHAPTER 15 Association Rules and Collaborative Filtering 341 CHAPTER 16 Cluster Analysis 369 PART VII FORECASTING TIME SERIES CHAPTER 17 Handling Time Series 401 CHAPTER 18 Regression-Based Forecasting 415 CHAPTER 19 Smoothing Methods 445 PART VIII DATA ANALYTICS CHAPTER 20 Social Network Analytics 467 CHAPTER 21 Text Mining 487 CHAPTER 22 Responsible Data Science 507 PART IX CASES CHAPTER 23 Cases 537 References 575 Data Files Used in the Book 577 Index 579

    15 in stock

    £98.96

  • Operating AI

    John Wiley & Sons Inc Operating AI

    15 in stock

    Book SynopsisA holistic and real-world approach to operationalizing artificial intelligence in your company InOperating AI, Director of Technology and Architecture at Ericsson AB, Ulrika Jägare, delivers an eye-opening new discussion of how to introduce your organization to artificial intelligence by balancing data engineering, model development, and AI operations. You'll learn the importance of embracing an AI operational mindset to successfully operate AI and lead AI initiatives through the entire lifecycle, including keyareas such as; data mesh, data fabric,aspects ofsecurity,data privacy,data rights and IPR related to data and AI models. In the book, you'll also discover: How to reduce the risk of entering bias in our artificial intelligence solutions and how to approach explainable AI (XAI)The importance of efficient and reproduceable data pipelines, including how to manage your company's dataAn operational perspective on the development of AI models using the MLOps (Machine Learning Operations) approach, including how to deploy, run and monitor models and ML pipelines in production using CI/CD/CT techniques, that generates value in the real worldKey competences and toolsets in AI development, deployment and operationsWhat to consider when operating different types of AI business models With a strong emphasis on deployment and operations of trustworthy and reliable AI solutions that operate well in the real worldand not just the labOperating AIis a must-read for business leaders looking for ways to operationalize an AI business model that actually makes money, from the concept phase to running in a live production environment.Table of ContentsForeword xii Introduction xv Chapter 1 Balancing the AI Investment 1 Defining AI and Related Concepts 3 Operational Readiness and Why It Matters 8 Applying an Operational Mind- set from the Start 12 The Operational Challenge 15 Strategy, People, and Technology Considerations 19 Strategic Success Factors in Operating AI 20 People and Mind- sets 23 The Technology Perspective 28 Chapter 2 Data Engineering Focused on AI 31 Know Your Data 32 Know the Data Structure 32 Know the Data Records 34 Know the Business Data Oddities 35 Know the Data Origin 36 Know the Data Collection Scope 37 The Data Pipeline 38 Types of Data Pipeline Solutions 41 Data Quality in Data Pipelines 44 The Data Quality Approach in AI/ML 45 Scaling Data for AI 49 Key Capabilities for Scaling Data 51 Introducing a Data Mesh 53 When You Have No Data 55 The Role of a Data Fabric 56 Why a Data Fabric Matters in AI/ML 58 Key Competences and Skillsets in Data Engineering 60 Chapter 3 Embracing MLOps 71 MLOps as a Concept 72 From ML Models to ML Pipelines 76 The ML Pipeline 78 Adopt a Continuous Learning Approach 84 The Maturity of Your AI/ML Capability 86 Level 0— Model Focus and No MLOps 88 Level 1— Pipelines Rather than Models 89 Level 2— Leveraging Continuous Learning 90 The Model Training Environment 91 Enabling ML Experimentation 92 Using a Simulator for Model Training 94 Environmental Impact of Training AI Models 96 Considering the AI/ML Functional Technology Stack 97 Key Competences and Toolsets in MLOps 103 Clarifying Similarities and Differences 106 MLOps Toolsets 107 Chapter 4 Deployment with AI Operations in Mind 115 Model Serving in Practice 117 Feature Stores 118 Deploying, Serving, and Inferencing Models at Scale 121 The ML Inference Pipeline 123 Model Serving Architecture Components 125 Considerations Regarding Toolsets for Model Serving 129 The Industrialization of AI 129 The Importance of a Cultural Shift 139 Chapter 5 Operating AI Is Different from Operating Software 143 Model Monitoring 144 Ensuring Efficient ML Model Monitoring 145 Model Scoring in Production 146 Retraining in Production Using Continuous Training 151 Data Aspects Related to Model Retraining 155 Understanding Different Retraining Techniques 156 Deployment after Retraining 159 Disadvantages of Retraining Models Frequently 159 Diagnosing and Managing Model Performance Issues in Operations 161 Issues with Data Processing 162 Issues with Data Schema Change 163 Data Loss at the Source 165 Models Are Broken Upstream 166 Monitoring Data Quality and Integrity 167 Monitoring the Model Calls 167 Monitoring the Data Schema 168 Detecting Any Missing Data 168 Validating the Feature Values 169 Monitor the Feature Processing 170 Model Monitoring for Stakeholders 171 Ensuring Stakeholder Collaboration for Model Success 173 Toolsets for Model Monitoring in Production 175 Chapter 6 AI Is All About Trust 181 Anonymizing Data 182 Data Anonymization Techniques 185 Pros and Cons of Data Anonymization 187 Explainable AI 189 Complex AI Models Are Harder to Understand 190 What Is Interpretability? 191 The Need for Interpretability in Different Phases 192 Reducing Bias in Practice 194 Rights to the Data and AI Models 199 Data Ownership 200 Who Owns What in a Trained AI Model? 202 Balancing the IP Approach for AI Models 205 The Role of AI Model Training 206 Addressing IP Ownership in AI Results 207 Legal Aspects of AI Techniques 208 Operational Governance of Data and AI 210 Chapter 7 Achieving Business Value from AI 215 The Challenge of Leveraging Value from AI 216 Productivity 216 Reliability 217 Risk 218 People 219 Top Management and AI Business Realization 219 Measuring AI Business Value 223 Measuring AI Value in Nonrevenue Terms 227 Operating Different AI Business Models 229 Operating Artificial Intelligence as a Service 230 Operating Embedded AI Solutions 236 Operating a Hybrid AI Business Model 239 Index 241

    15 in stock

    £24.79

  • Deep Learning

    John Wiley & Sons Inc Deep Learning

    15 in stock

    Book SynopsisDEEP LEARNING A concise and practical exploration of key topics and applications in data science In Deep Learning: From Big Data to Artificial Intelligence with R, expert researcher Dr. Stéphane Tufféry delivers an insightful discussion of the applications of deep learning and big data that focuses on practical instructions on various software tools and deep learning methods relying on three major libraries: MXNet, PyTorch, and Keras-TensorFlow. In the book, numerous, up-to-date examples are combined with key topics relevant to modern data scientists, including processing optimization, neural network applications, natural language processing, and image recognition. This is a thoroughly revised and updated edition of a book originally released in French, with new examples and methods included throughout. Classroom-tested and intuitively organized, Deep Learning: From Big Data to Artificial Intelligence with R offers complimentary accesTable of ContentsAcknowledgements xiii Introduction xv 1 From Big Data to Deep Learning 1 1.1 Introduction 1 1.2 Examples of the Use of Big Data and Deep Learning 6 1.3 Big Data and Deep Learning for Companies and Organizations 9 1.3.1 Big Data in Finance 10 1.3.1.1 Google Trends 10 1.3.1.2 Google Trends and Stock Prices 11 1.3.1.3 The quantmod Package for Financial Analysis 11 1.3.1.4 Google Trends in R 13 1.3.1.5 Matching Data from quantmod and Google Trends 14 1.3.2 Big Data and Deep Learning in Insurance 18 1.3.3 Big Data and Deep Learning in Industry 18 1.3.4 Big Data and Deep Learning in Scientific Research and Education 20 1.3.4.1 Big Data in Physics and Astrophysics 20 1.3.4.2 Big Data in Climatology and Earth Sciences 21 1.3.4.3 Big Data in Education 21 1.4 Big Data and Deep Learning for Individuals 21 1.4.1 Big Data and Deep Learning in Healthcare 21 1.4.1.1 Connected Health and Telemedicine 21 1.4.1.2 Geolocation and Health 22 1.4.1.3 The Google Flu Trends 23 1.4.1.4 Research in Health and Medicine 26 1.4.2 Big Data and Deep Learning for Drivers 28 1.4.3 Big Data and Deep Learning for Citizens 29 1.4.4 Big Data and Deep Learning in the Police 30 1.5 Risks in Data Processing 32 1.5.1 Insufficient Quantity of Training Data 32 1.5.2 Poor Data Quality 32 1.5.3 Non-Representative Samples 33 1.5.4 Missing Values in the Data 33 1.5.5 Spurious Correlations 34 1.5.6 Overfitting 35 1.5.7 Lack of Explainability of Models 35 1.6 Protection of Personal Data 36 1.6.1 The Need for Data Protection 36 1.6.2 Data Anonymization 38 1.6.3 The General Data Protection Regulation 41 1.7 Open Data 43 Notes 44 2 Processing of Large Volumes of Data 49 2.1 Issues 49 2.2 The Search for a Parsimonious Model 50 2.3 Algorithmic Complexity 51 2.4 Parallel Computing 51 2.5 Distributed Computing 52 2.5.1 MapReduce 53 2.5.2 Hadoop 54 2.5.3 Computing Tools for Distributed Computing 55 2.5.4 Column-Oriented Databases 56 2.5.5 Distributed Architecture and “Analytics" 57 2.5.6 Spark 58 2.6 Computer Resources 60 2.6.1 Minimum Resources 60 2.6.2 Graphics Processing Units (GPU) and Tensor Processing Units (TPU) 61 2.6.3 Solutions in the Cloud 62 2.7 R and Python Software 62 2.8 Quantum Computing 67 Notes 68 3 Reminders of Machine Learning 71 3.1 General 71 3.2 The Optimization Algorithms 74 3.3 Complexity Reduction and Penalized Regression 85 3.4 Ensemble Methods 89 3.4.1 Bagging 89 3.4.2 Random Forests 89 3.4.3 Extra-Trees 91 3.4.4 Boosting 92 3.4.5 Gradient Boosting Methods 97 3.4.6 Synthesis of the Ensemble Methods 100 3.5 Support Vector Machines 100 3.6 Recommendation Systems 105 Notes 108 4 Natural Language Processing 111 4.1 From Lexical Statistics to Natural Language Processing 111 4.2 Uses of Text Mining and Natural Language Processing 113 4.3 The Operations of Textual Analysis 114 4.3.1 Textual Data Collection 115 4.3.2 Identification of the Language 115 4.3.3 Tokenization 116 4.3.4 Part-of-Speech Tagging 117 4.3.5 Named Entity Recognition 119 4.3.6 Coreference Resolution 124 4.3.7 Lemmatization 124 4.3.8 Stemming 129 4.3.9 Simplifications 129 4.3.10 Removal of StopWords 130 4.4 Vector Representation andWord Embedding 132 4.4.1 Vector Representation 132 4.4.2 Analysis on the Document-Term Matrix 133 4.4.3 TF-IDF Weighting 142 4.4.4 Latent Semantic Analysis 144 4.4.5 Latent Dirichlet Allocation 152 4.4.6 Word Frequency Analysis 160 4.4.7 Word2Vec Embedding 162 4.4.8 GloVe Embedding 174 4.4.9 FastText Embedding 176 4.5 Sentiment Analysis 180 Notes 184 5 Social Network Analysis 187 5.1 Social Networks 187 5.2 Characteristics of Graphs 188 5.3 Characterization of Social Networks 189 5.4 Measures of Influence in a Graph 190 5.5 Graphs with R 191 5.6 Community Detection 200 5.6.1 The Modularity of a Graph 201 5.6.2 Community Detection by Divisive Hierarchical Clustering 202 5.6.3 Community Detection by Agglomerative Hierarchical Clustering 203 5.6.4 Other Methods 204 5.6.5 Community Detection with R 205 5.7 Research and Analysis on Social Networks 208 5.8 The Business Model of Social Networks 209 5.9 Digital Advertising 211 5.10 Social Network Analysis with R 212 5.10.1 Collecting Tweets 213 5.10.2 Formatting the Corpus 215 5.10.3 Stemming and Lemmatization 216 5.10.4 Example 217 5.10.5 Clustering of Terms and Documents 225 5.10.6 Opinion Scoring 230 5.10.7 Graph of Terms with Their Connotation 231 Notes 234 6 Handwriting Recognition 237 6.1 Data 237 6.2 Issues 238 6.3 Data Processing 238 6.4 Linear and Quadratic Discriminant Analysis 243 6.5 Multinomial Logistic Regression 245 6.6 Random Forests 246 6.7 Extra-Trees 247 6.8 Gradient Boosting 249 6.9 Support Vector Machines 253 6.10 Single Hidden Layer Perceptron 258 6.11 H2O Neural Network 262 6.12 Synthesis of “Classical” Methods 267 Notes 268 7 Deep Learning 269 7.1 The Principles of Deep Learning 269 7.2 Overview of Deep Neural Networks 272 7.3 Recall on Neural Networks and Their Training 274 7.4 Difficulties of Gradient Backpropagation 284 7.5 The Structure of a Convolutional Neural Network 286 7.6 The Convolution Mechanism 288 7.7 The Convolution Parameters 290 7.8 Batch Normalization 292 7.9 Pooling 293 7.10 Dilated Convolution 295 7.11 Dropout and DropConnect 295 7.12 The Architecture of a Convolutional Neural Network 297 7.13 Principles of Deep Network Learning for Computer Vision 299 7.14 Adaptive Learning Algorithms 301 7.15 Progress in Image Recognition 304 7.16 Recurrent Neural Networks 312 7.17 Capsule Networks 317 7.18 Autoencoders 318 7.19 Generative Models 322 7.19.1 Generative Adversarial Networks 323 7.19.2 Variational Autoencoders 324 7.20 Other Applications of Deep Learning 326 7.20.1 Object Detection 326 7.20.2 Autonomous Vehicles 333 7.20.3 Analysis of Brain Activity 334 7.20.4 Analysis of the Style of a PictorialWork 336 7.20.5 Go and Chess Games 338 7.20.6 Other Games 340 Notes 341 8 Deep Learning for Computer Vision 347 8.1 Deep Learning Libraries 347 8.2 MXNet 349 8.2.1 General Information about MXNet 349 8.2.2 Creating a Convolutional Network with MXNet 350 8.2.3 Model Management with MXNet 361 8.2.4 CIFAR-10 Image Recognition with MXNet 362 8.3 Keras and TensorFlow 367 8.3.1 General Information about Keras 370 8.3.2 Application of Keras to the MNIST Database 371 8.3.3 Application of Pre-Trained Models 375 8.3.4 Explain the Prediction of a Computer Vision Model 379 8.3.5 Application of Keras to CIFAR-10 Images 382 8.3.6 Classifying Cats and Dogs 393 8.4 Configuring a Machine’s GPU for Deep Learning 409 8.4.1 Checking the Compatibility of the Graphics Card 410 8.4.2 NVIDIA Driver Installation 410 8.4.3 Installation of Microsoft Visual Studio 411 8.4.4 NVIDIA CUDA To34olkit Installation 411 8.4.5 Installation of cuDNN 412 8.5 Computing in the Cloud 412 8.6 PyTorch 419 8.6.1 The Python PyTorch Package 419 8.6.2 The R torch Package 425 Notes 431 9 Deep Learning for Natural Language Processing 433 9.1 Neural Network Methods for Text Analysis 433 9.2 Text Generation Using a Recurrent Neural Network LSTM 434 9.3 Text Classification Using a LSTM or GRU Recurrent Neural Network 440 9.4 Text Classification Using a H2O Model 452 9.5 Application of Convolutional Neural Networks 456 9.6 Spam Detection Using a Recurrent Neural Network LSTM 460 9.7 Transformer Models, BERT, and Its Successors 461 Notes 479 10 Artificial Intelligence 481 10.1 The Beginnings of Artificial Intelligence 481 10.2 Human Intelligence and Artificial Intelligence 486 10.3 The Different Forms of Artificial Intelligence 488 10.4 Ethical and Societal Issues of Artificial Intelligence 493 10.5 Fears and Hopes of Artificial Intelligence 496 10.6 Some Dates of Artificial Intelligence 499 Notes 502 Conclusion 505 Note 506 Annotated Bibliography 507 On Big Data and High Dimensional Statistics 507 On Deep Learning 509 On Artificial Intelligence 511 On the Use of R and Python in Data Science and on Big Data 512 Index 515

    15 in stock

    £63.00

  • Fuzzy Computing in Data Science

    John Wiley & Sons Inc Fuzzy Computing in Data Science

    15 in stock

    Book SynopsisFUZZY COMPUTING IN DATA SCIENCE This book comprehensively explains how to use various fuzzy-based models to solve real-time industrial challenges. The book provides information about fundamental aspects of the field and explores the myriad applications of fuzzy logic techniques and methods. It presents basic conceptual considerations and case studies of applications of fuzzy computation. It covers the fundamental concepts and techniques for system modeling, information processing, intelligent system design, decision analysis, statistical analysis, pattern recognition, automated learning, system control, and identification. The book also discusses the combination of fuzzy computation techniques with other computational intelligence approaches such as neural and evolutionary computation. Audience Researchers and students in computer science, artificial intelligence, machine learning, big data analytics, and information and communication technology.Table of ContentsPreface xvii Acknowledgement xxi 1 Band Reduction of HSI Segmentation Using FCM 1 V. Saravana Kumar, S. Anantha Sivaprakasam, E.R. Naganathan, Sunil Bhutada and M. Kavitha 1.1 Introduction 2 1.2 Existing Method 3 1.2.1 K-Means Clustering Method 3 1.2.2 Fuzzy C-Means 3 1.2.3 Davies Bouldin Index 4 1.2.4 Data Set Description of HSI 4 1.3 Proposed Method 5 1.3.1 Hyperspectral Image Segmentation Using Enhanced Estimation of Centroid 5 1.3.2 Band Reduction Using K-Means Algorithm 6 1.3.3 Band Reduction Using Fuzzy C-Means 7 1.4 Experimental Results 8 1.4.1 DB Index Graph 8 1.4.2 K-Means–Based PSC (EEOC) 9 1.4.3 Fuzzy C-Means–Based PSC (EEOC) 10 1.5 Analysis of Results 12 1.6 Conclusions 16 References 17 2 A Fuzzy Approach to Face Mask Detection 21 Vatsal Mishra, Tavish Awasthi, Subham Kashyap, Minerva Brahma, Monideepa Roy and Sujoy Datta 2.1 Introduction 22 2.2 Existing Work 23 2.3 The Proposed Framework 26 2.4 Set-Up and Libraries Used 26 2.5 Implementation 27 2.6 Results and Analysis 29 2.7 Conclusion and Future Work 33 References 34 3 Application of Fuzzy Logic to the Healthcare Industry 37 Biswajeet Sahu, Lokanath Sarangi, Abhinadita Ghosh and Hemanta Kumar Palo 3.1 Introduction 38 3.2 Background 41 3.3 Fuzzy Logic 42 3.4 Fuzzy Logic in Healthcare 45 3.5 Conclusions 49 References 50 4 A Bibliometric Approach and Systematic Exploration of Global Research Activity on Fuzzy Logic in Scopus Database 55 Sugyanta Priyadarshini and Nisrutha Dulla 4.1 Introduction 56 4.2 Data Extraction and Interpretation 58 4.3 Results and Discussion 59 4.3.1 Per Year Publication and Citation Count 59 4.3.2 Prominent Affiliations Contributing Toward Fuzzy Logic 60 4.3.3 Top Journals Emerging in Fuzzy Logic in Major Subject Areas 61 4.3.4 Major Contributing Countries Toward Fuzzy Research Articles 63 4.3.5 Prominent Authors Contribution Toward the Fuzzy Logic Analysis 66 4.3.6 Coauthorship of Authors 67 4.3.7 Cocitation Analysis of Cited Authors 68 4.3.8 Cooccurrence of Author Keywords 68 4.4 Bibliographic Coupling of Documents, Sources, Authors, and Countries 70 4.4.1 Bibliographic Coupling of Documents 70 4.4.2 Bibliographic Coupling of Sources 71 4.4.3 Bibliographic Coupling of Authors 72 4.4.4 Bibliographic Coupling of Countries 73 4.5 Conclusion 74 References 76 5 Fuzzy Decision Making in Predictive Analytics and Resource Scheduling 79 Rekha A. Kulkarni, Suhas H. Patil and Bithika Bishesh 5.1 Introduction 80 5.2 History of Fuzzy Logic and Its Applications 81 5.3 Approximate Reasoning 82 5.4 Fuzzy Sets vs Classical Sets 83 5.5 Fuzzy Inference System 84 5.5.1 Characteristics of FIS 85 5.5.2 Working of FIS 85 5.5.3 Methods of FIS 86 5.6 Fuzzy Decision Trees 86 5.6.1 Characteristics of Decision Trees 87 5.6.2 Construction of Fuzzy Decision Trees 87 5.7 Fuzzy Logic as Applied to Resource Scheduling in a Cloud Environment 88 5.8 Conclusion 90 References 91 6 Application of Fuzzy Logic and Machine Learning Concept in Sales Data Forecasting Decision Analytics Using ARIMA Model 93 S. Mala and V. Umadevi 6.1 Introduction 94 6.1.1 Aim and Scope 94 6.1.2 R-Tool 94 6.1.3 Application of Fuzzy Logic 94 6.1.4 Dataset 95 6.2 Model Study 96 6.2.1 Introduction to Machine Learning Method 96 6.2.2 Time Series Analysis 96 6.2.3 Components of a Time Series 97 6.2.4 Concepts of Stationary 99 6.2.5 Model Parsimony 100 6.3 Methodology 100 6.3.1 Exploratory Data Analysis 100 6.3.1.1 Seed Types—Analysis 101 6.3.1.2 Comparison of Location and Seeds 101 6.3.1.3 Comparison of Season (Month) and Seeds 103 6.3.2 Forecasting 103 6.3.2.1 Auto Regressive Integrated Moving Average (ARIMA) 103 6.3.2.2 Data Visualization 106 6.3.2.3 Implementation Model 108 6.4 Result Analysis 108 6.5 Conclusion 110 References 110 7 Modified m-Polar Fuzzy Set ELECTRE-I Approach 113 Madan Jagtap, Prasad Karande and Pravin Patil 7.1 Introduction 114 7.1.1 Objectives 114 7.2 Implementation of m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculations 115 7.2.1 The m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculation Method 115 7.3 Application to Industrial Problems 118 7.3.1 Cutting Fluid Selection Problem 118 7.3.2 Results Obtained From m-Polar Fuzzy ELECTRE-I for Cutting Fluid Selection Problem 122 7.3.3 FMS Selection Problem 125 7.3.4 Results Obtained From m-Polar Fuzzy ELECTRE-I for FMS Selection 130 7.4 Conclusions 143 References 143 8 Fuzzy Decision Making: Concept and Models 147 Bithika Bishesh 8.1 Introduction 148 8.2 Classical Set 149 8.3 Fuzzy Set 150 8.4 Properties of Fuzzy Set 151 8.5 Types of Decision Making 153 8.5.1 Individual Decision Making 153 8.5.2 Multiperson Decision Making 157 8.5.3 Multistage Decision Making 158 8.5.4 Multicriteria Decision Making 160 8.6 Methods of Multiattribute Decision Making (MADM) 162 8.6.1 Weighted Sum Method (WSM) 162 8.6.2 Weighted Product Method (WPM) 162 8.6.3 Weighted Aggregates Sum Product Assessment (WASPAS) 163 8.6.4 Technique for Order Preference by Similarity to Ideal Solutions (TOPSIS) 166 8.7 Applications of Fuzzy Logic 167 8.8 Conclusion 169 References 169 9 Use of Fuzzy Logic for Psychological Support to Migrant Workers of Southern Odisha (India) 173 Sanjaya Kumar Sahoo and Sukanta Chandra Swain 9.1 Introduction 174 9.2 Objectives and Methodology 175 9.2.1 Objectives 175 9.2.2 Methodology 176 9.3 Effect of COVID-19 on the Psychology and Emotion of Repatriated Migrants 176 9.3.1 Psychological Variables Identified 176 9.3.2 Fuzzy Logic for Solace to Migrants 176 9.4 Findings 178 9.5 Way Out for Strengthening the Psychological Strength of the Migrant Workers through Technological Aid 178 9.6 Conclusion 179 References 180 10 Fuzzy-Based Edge AI Approach: Smart Transformation of Healthcare for a Better Tomorrow 181 B. RaviKrishna, Sirisha Potluri, J. Rethna Virgil Jeny, Guna Sekhar Sajja and Katta Subba Rao 10.1 Significance of Machine Learning in Healthcare 182 10.2 Cloud-Based Artificial Intelligent Secure Models 183 10.3 Applications and Usage of Machine Learning in Healthcare 183 10.3.1 Detecting Diseases and Diagnosis 183 10.3.2 Drug Detection and Manufacturing 183 10.3.3 Medical Imaging Analysis and Diagnosis 184 10.3.4 Personalized/Adapted Medicine 185 10.3.5 Behavioral Modification 185 10.3.6 Maintenance of Smart Health Data 185 10.3.7 Clinical Trial and Study 185 10.3.8 Crowdsourced Information Discovery 185 10.3.9 Enhanced Radiotherapy 186 10.3.10 Outbreak/Epidemic Prediction 186 10.4 Edge AI: For Smart Transformation of Healthcare 186 10.4.1 Role of Edge in Reshaping Healthcare 186 10.4.2 How AI Powers the Edge 187 10.5 Edge AI-Modernizing Human Machine Interface 188 10.5.1 Rural Medicine 188 10.5.2 Autonomous Monitoring of Hospital Rooms—A Case Study 188 10.6 Significance of Fuzzy in Healthcare 189 10.6.1 Fuzzy Logic—Outline 189 10.6.2 Fuzzy Logic-Based Smart Healthcare 190 10.6.3 Medical Diagnosis Using Fuzzy Logic for Decision Support Systems 191 10.6.4 Applications of Fuzzy Logic in Healthcare 193 10.7 Conclusion and Discussions 193 References 194 11 Video Conferencing (VC) Software Selection Using Fuzzy TOPSIS 197 Rekha Gupta 11.1 Introduction 197 11.2 Video Conferencing Software and Its Major Features 199 11.2.1 Video Conferencing/Meeting Software (VC/MS) for Higher Education Institutes 199 11.3 Fuzzy TOPSIS 203 11.3.1 Extension of TOPSIS Algorithm: Fuzzy TOPSIS 203 11.4 Sample Numerical Illustration 207 11.5 Conclusions 213 References 213 12 Estimation of Nonperforming Assets of Indian Commercial Banks Using Fuzzy AHP and Goal Programming 215 Kandarp Vidyasagar and Rajiv Kr. Dwivedi 12.1 Introduction 216 12.1.1 Basic Concepts of Fuzzy AHP and Goal Programming 217 12.2 Research Model 221 12.2.1 Average Growth Rate Calculation 227 12.3 Result and Discussion 233 12.4 Conclusion 234 References 234 13 Evaluation of Ergonomic Design for the Visual Display Terminal Operator at Static Work Under FMCDM Environment 237 Bipradas Bairagi 13.1 Introduction 238 13.2 Proposed Algorithm 240 13.3 An Illustrative Example on Ergonomic Design Evaluation 245 13.4 Conclusions 249 References 249 14 Optimization of Energy Generated from Ocean Wave Energy Using Fuzzy Logic 253 S. B. Goyal, Pradeep Bedi, Jugnesh Kumar and Prasenjit Chatterjee 14.1 Introduction 254 14.2 Control Approach in Wave Energy Systems 255 14.3 Related Work 257 14.4 Mathematical Modeling for Energy Conversion from Ocean Waves 259 14.5 Proposed Methodology 260 14.5.1 Wave Parameters 261 14.5.2 Fuzzy-Optimizer 262 14.6 Conclusion 264 References 264 15 The m-Polar Fuzzy TOPSIS Method for NTM Selection 267 Madan Jagtap and Prasad Karande 15.1 Introduction 268 15.2 Literature Review 268 15.3 Methodology 270 15.3.1 Steps of the mFS TOPSIS 270 15.4 Case Study 272 15.4.1 Effect of Analytical Hierarchy Process (AHP) Weight Calculation on the mFS TOPSIS Method 273 15.4.2 Effect of Shannon’s Entropy Weight Calculation on the m-Polar Fuzzy Set TOPSIS Method 277 15.5 Results and Discussions 281 15.5.1 Result Validation 281 15.6 Conclusions and Future Scope 283 References 284 16 Comparative Analysis on Material Handling Device Selection Using Hybrid FMCDM Methodology 287 Bipradas Bairagi 16.1 Introduction 288 16.2 MCDM Techniques 289 16.2.1 Fahp 289 16.2.2 Entropy Method as Weights (Influence) Evaluation Technique 290 16.3 The Proposed Hybrid and Super Hybrid FMCDM Approaches 291 16.3.1 Topsis 291 16.3.2 FMOORA Method 292 16.3.3 FVIKOR 292 16.3.4 Fuzzy Grey Theory (FGT) 293 16.3.5 COPRAS –G 293 16.3.6 Super Hybrid Algorithm 294 16.4 Illustrative Example 295 16.5 Results and Discussions 298 16.5.1 FTOPSIS 298 16.5.2 FMOORA 298 16.5.3 FVIKRA 298 16.5.4 Fuzzy Grey Theory (FGT) 299 16.5.5 COPRAS-G 299 16.5.6 Super Hybrid Approach (SHA) 299 16.6 Conclusions 302 References 302 17 Fuzzy MCDM on CCPM for Decision Making: A Case Study 305 Bimal K. Jena, Biswajit Das, Amarendra Baral and Sushanta Tripathy 17.1 Introduction 306 17.2 Literature Review 307 17.3 Objective of Research 308 17.4 Cluster Analysis 308 17.4.1 Hierarchical Clustering 309 17.4.2 Partitional Clustering 309 17.5 Clustering 310 17.6 Methodology 314 17.7 TOPSIS Method 316 17.8 Fuzzy TOPSIS Method 318 17.9 Conclusion 325 17.10 Scope of Future Study 326 References 326 Index 329

    15 in stock

    £133.20

  • Teach Yourself Visually Power Bi

    John Wiley & Sons Inc Teach Yourself Visually Power Bi

    15 in stock

    Book SynopsisTable of ContentsChapter 1 Getting Started with Power BI What Is Power BI? 4 Understanding the Different Components of Power BI 6 Understanding Power BI as Part of the Power Platform 7 Install Power BI Desktop 8 Start and Pin Power BI Desktop 10 Explore the Power BI Workspace 12 Chapter 2 Connecting Power BI to Your Data Grasp How Power Query Editor Works with Power BI Desktop 18 Connect Power BI Desktop to a Local File 20 Save, Close, and Open Power BI Reports 22 Start Working with the Sample Dataset 24 Connect to a Power BI Dataset 28 Connect to a SharePoint List 30 Connect to a SQL Server Database 34 Chapter 3 Cleaning and Shaping Data Remove Duplicate Values 40 Replace Values in a Column 42 Split a Column Using a Delimiter 44 Group Data 46 Add a Calculated Column 48 Add an Index Column 50 Chapter 4 Modeling Data in Model View Create Dimension Tables 54 Create Relationships Between Tables 58 Create a Star Schema 62 Create a Hierarchical Schema 64 Using the Properties Pane 70 Chapter 5 Creating Basic Visualizations Create a Bar Chart 74 Apply Filters to Visuals 76 Format the Y-Axis of a Bar Chart 78 Format the X-Axis of a Bar Chart 80 Add and Format the Data Category of a Bar Chart 82 Move a Bar Chart’s Legend and Add Gridlines 84 Add a Zoom Slider and Update Bar Colors 86 Add Data Labels to a Bar Chart 88 Add an Image to the Plot Area Background 90 Create a Line Chart or Area Chart 92 Format the Axes of a Line or Area Chart 94 Add a Legend to a Line or Area Chart 96 Move the Legend and Add Gridlines to a Line or Area Chart 98 Add a Zoom Slider and Steps to a Line or Area Chart 100 Add Data Markers and Labels to a Line or Area Chart 102 Format the Data Labels of a Line or Area Chart 104 Chapter 6 Creating Advanced Data Visualizations Create and Format a Gauge Chart 108 Create and Format a KPI Visual 112 Create a Matrix Visual 116 Format a Matrix Visual 118 Format the Values and Column Headers of a Matrix Visual 120 Format the Row Headers of a Matrix Visual 122 Format the Row Subtotals and Grand Totals of a Matrix Visual 124 Format the Specific Column and Cell Elements of a Matrix Visual 126 Create a Waterfall Chart 128 Format a Waterfall Chart 130 Format the X-Axis and Legend of a Waterfall Chart 132 Add and Format Breakdowns in a Waterfall Chart 134 Create, Format, and Label a Funnel Chart 136 Create a Pie Chart or Donut Chart 140 Format a Pie Chart or Donut Chart 142 Create a Treemap Chart 144 Format a Treemap Chart 146 Chapter 7 Showing Geographic Data on Maps Create a Proportional Symbol Map 150 Create a Choropleth Map 152 Add Conditional Formatting to a Choropleth Map 154 Enable Power BI’s Preview Features 156 Create an Isarithmic Map 158 Create a Skyscraper Map 160 Chapter 8 Using Calculated Columns and DAX Understanding DAX and Why You Should Use It 164 Add All Numbers in a Column 166 Perform Division 168 Check a Condition 170 Count the Number of Cells in a Column 172 Return the Average of All Numbers in a Column 174 Join Two Text Strings into One Text String 176 Apply Conditional Formatting in Tables 178 Chapter 9 Using Analytics and Machine Learning Identify Outliers 184 Find Groups of Similar Data by Clustering 186 Create a Dataflow 188 Apply Binary Prediction with AutoML 192 Chapter 10 Creating Interactive Reports Planning to Create a Report 198 Start a Report and Add a Title 200 Add Visuals to a Report 202 Add Slicers to a Report 206 Control Which Visuals and Slicers Interact 208 Enable and Control Drill-Through Actions 210 Split a Page into Sections 214 Add Bookmarks and Navigation to a Report 218 Chapter 11 Publishing Reports and Dashboards Set Up a Workspace 224 Ask Questions About the Data 226 Publish a Report to the Power BI Service 228 Set Up Row-Level Security 230 Add Tiles to a Dashboard 232 Share a Dashboard 234 Schedule Data Refreshes 236 Publish a Report to the Web 238 Index 240

    15 in stock

    £18.39

  • Data Analytics in the AWS Cloud

    John Wiley & Sons Inc Data Analytics in the AWS Cloud

    15 in stock

    Book SynopsisA comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analyticsfrom data engineering to analysis, business intelligence, DevOps, and MLOpsas you discover how to integrate machine learning predictions with analytics engines and visualization tools. You'll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify mTable of ContentsIntroduction xxiii Chapter 1 AWS Data Lakes and Analytics Technology Overview 1 Why AWS? 1 What Does a Data Lake Look Like in AWS? 2 Analytics on AWS 3 Skills Required to Build and Maintain an AWS Analytics Pipeline 3 Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team 5 The Data Vision 6 Support 6 DA Team Roles 7 Early Stage Roles 7 Team Lead 8 Data Architect 8 Data Engineer 8 Data Analyst 9 Maturity Stage Roles 9 Data Scientist 9 Cloud Engineer 10 Business Intelligence (BI) Developer 10 Machine Learning Engineer 10 Business Analyst 11 Niche Roles 11 Analytics Flow at a Process Level 12 Workflow Methodology 12 The DA Team Mantra: “Automate Everything” 14 Analytics Models in the Wild: Centralized, Distributed, Center of Excellence 15 Centralized 15 Distributed 16 Center of Excellence 16 Summary 17 Chapter 3 Working on AWS 19 Accessing AWS 20 Everything Is a Resource 21 S3: An Important Exception 21 IAM: Policies, Roles, and Users 22 Policies 22 Identity- Based Policies 24 Resource- Based Policies 25 Roles 25 Users and User Groups 25 Summarizing IAM 26 Working with the Web Console 26 The AWS Command- Line Interface 29 Installing AWS cli 29 Linux Installation 30 macOS Installation 30 Windows 31 Configuring AWS cli 31 A Note on Region 33 Setting Individual Parameters 33 Using Profiles and Configuration Files 33 Final Notes on Configuration 36 Using the AWS cli 36 Using Skeletons and File Inputs 39 Cleaning Up! 43 Infrastructure- as- Code: CloudFormation and Terraform 44 CloudFormation 44 CloudFormation Stacks 46 CloudFormation Template Anatomy 47 CloudFormation Changesets 52 Getting Stack Information 55 Cleaning Up Again 57 CloudFormation Conclusions 58 Terraform 58 Coding Style 58 Modularity 59 Limitations 59 Terraform vs. CloudFormation 60 Infrastructure- as- Code: CDK, Pulumi, Cloudcraft, and Other Solutions 60 AWS CDK 60 Pulumi 62 Cloudcraft 62 Infrastructure Management Conclusions 63 Chapter 4 Serverless Computing and Data Engineering 65 Serverless vs. Fully Managed 65 AWS Serverless Technologies 66 AWS Lambda 67 Pricing Model 67 Laser Focus on Code 68 The Lambda Paradigm Shift 69 Virtually Infinite Scalability 70 Geographical Distribution 70 A Lambda Hello World 71 Lambda Configuration 74 Runtime 74 Container- Based Lambdas 75 Architectures 75 Memory 75 Networking 76 Execution Role 76 Environment Variables 76 AWS EventBridge 77 AWS Fargate 77 AWS DynamoDB 77 AWS SNS 77 Amazon SQS 78 AWS CloudWatch 78 Amazon QuickSight 78 AWS Step Functions 78 Amazon API Gateway 79 Amazon Cognito 79 AWS Serverless Application Model (SAM) 79 Ephemeral Infrastructure 80 AWS SAM Installation 80 Configuration 80 Creating Your First AWS SAM Project 81 Application Structure 83 SAM Resource Types 85 SAM Lambda Template 86 !! Recursive Lambda Invocation !! 88 Function Metadata 88 Outputs 89 Implicitly Generated Resources 89 Other Template Sections 90 Lambda Code 90 Building Your First SAM Application 93 Testing the AWS SAM Application Locally 96 Deployment 99 Cleaning Up 104 Summary 104 Chapter 5 Data Ingestion 105 AWS Data Lake Architecture 106 Serverless Data Lake Architecture Structure 106 Ingestion 106 Storage and Processing 108 Cataloging, Governance, and Search 108 Security and Monitoring 109 Consumption 109 Sample Processing Architecture: Cataloging Images into DynamoDB 109 Use Case Description 109 SAM Application Creation 110 S3- Triggered Lambda 111 Adding DynamoDB 119 Lambda Execution Context 121 Inserting into DynamoDB 121 Cleaning Up 123 Serverless Ingestion 124 AWS Fargate 124 AWS Lambda 124 Example Architecture: Fargate- Based Periodic Batch Import 125 The Basic Importer 125 ECS CLI 128 AWS Copilot cli 128 Clean Up 136 AWS Kinesis Ingestion 136 Example Architecture: Two- Pronged Delivery 137 Fully Managed Ingestion with AppFlow 146 Operational Data Ingestion with Database Migration Service 151 DMS Concepts 151 DMS Instance 151 DMS Endpoints 152 DMS Tasks 152 Summary of the Workflow 152 Common Use of DMS 153 Example Architecture: DMS to S3 154 DMS Instance 154 DMS Endpoints 156 DMS Task 162 Summary 167 Chapter 6 Processing Data 169 Phases of Data Preparation 170 What Is ETL? Why Should I Care? 170 ETL Job vs. Streaming Job 171 Overview of ETL in AWS 172 ETL with AWS Glue 172 ETL with Lambda Functions 172 ETL with Hadoop/EMR 173 Other Ways to Perform ETL 173 ETL Job Design Concepts 173 Source Identification 174 Destination Identification 174 Mappings 174 Validation 174 Filter 175 Join, Denormalization, Relationalization 175 AWS Glue for ETL 176 Really, It’s Just Spark 176 Visual 176 Spark Script Editor 177 Python Shell Script Editor 177 Jupyter Notebook 177 Connectors 177 Creating Connections 178 Creating Connections with the Web Console 178 Creating Connections with the AWS cli 179 Creating ETL Jobs with AWS Glue Visual Editor 184 ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) 184 Job Bookmarks 187 Transformations 188 Apply Mapping 189 Filter 189 Other Available Transforms 190 Run the Edited Job 191 Visual Editor with Source and Target Conclusions 192 Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) 192 Creating ETL Jobs with the Spark Script Editor 192 Developing ETL Jobs with AWS Glue Notebooks 193 What Is a Notebook? 194 Notebook Structure 194 Step 1: Load Code into a DynamicFrame 196 Step 2: Apply Field Mapping 197 Step 3: Apply the Filter 197 Step 4: Write to S3 in Parquet Format 198 Example: Joining and Denormalizing Data from Two S3 Locations 199 Conclusions for Manually Authored Jobs with Notebooks 203 Creating ETL Jobs with AWS Glue Interactive Sessions 204 It’s Magic 205 Development Workflow 206 Streaming Jobs 207 Differences with a Standard ETL Job 208 Streaming Sources 208 Example: Process Kinesis Streams with a Streaming Job 208 Streaming ETL Jobs Conclusions 217 Summary 217 Chapter 7 Cataloging, Governance, and Search 219 Cataloging with AWS Glue 219 AWS Glue and the AWS Glue Data Catalog 219 Glue Databases and Tables 220 Databases 220 The Idea of Schema- on- Read 221 Tables 222 Create Table Manually 223 Creating a Table from an Existing Schema 225 Creating a Table with a Crawler 225 Summary on Databases and Tables 226 Crawlers 226 Updating or Not Updating? 230 Running the Crawler 231 Creating a Crawler from the AWS CLI 231 Retrieving Table Information from the CLI 233 Classifiers 235 Classifier Example 236 Crawlers and Classifiers Summary 237 Search with Amazon Athena: The Heart of Analytics in AWS 238 A Bit of History 238 Interface Overview 238 Creating Tables Manually 239 Athena Data Types 240 Complex Types 241 Running a Query 242 Connecting with JDBC and ODBC 243 Query Stats 243 Recent Queries and Saved Queries 243 The Power of Partitions 244 Athena Pricing Model 244 Automatic Naming 245 Athena Query Output 246 Athena Peculiarities (SQL and Not) 246 Computed Fields Gotcha and WITH Statement Workaround 246 Lowercase! 247 Query Explain 248 Deduplicating Records 249 Working with JSON, Flattening, and Unnesting 250 Athena Views 251 Create Table as Select (CTAS) 252 Saving Queries and Reusing Saved Queries 253 Running Parameterized Queries 254 Athena Federated Queries 254 Athena Lambda Connectors 255 Note on Connection Errors 256 Performing Federated Queries 257 Creating a View from a Federated Query 258 Governing: Athena Workgroups, Lake Formation, and More 258 Athena Workgroups 259 Fine- Grained Athena Access with IAM 262 Recap of Athena- Based Governance 264 AWS Lake Formation 265 Registering a Location in Lake Formation 266 Creating a Database in Lake Formation 268 Assigning Permissions in Lake Formation 269 LF- Tags and Permissions in Lake Formation 271 Data Filters 277 Governance Conclusions 279 Summary 280 Chapter 8 Data Consumption: BI, Visualization, and Reporting 283 QuickSight 283 Signing Up for QuickSight 284 Standard Plan 284 Enterprise Plan 284 Users and User Groups 285 Managing Users and Groups 285 Managing QuickSight 286 Users and Groups 287 Your Subscriptions 287 SPICE Capacity 287 Account Settings 287 Security and Permissions 287 VPC Connections 288 Mobile Settings 289 Domains and Embedding 289 Single Sign- On 289 Data Sources and Datasets 289 Creating an Athena Data Source 291 Creating Other Data Sources 292 Creating a Data Source from the AWS cli 292 Creating a Dataset from a Table 294 Creating a Dataset from a SQL Query 295 Duplicating Datasets 296 Note on Creating Datasets 297 QuickSight Favorites, Recent, and Folders 297 SPICE 298 Manage SPICE Capacity 298 Refresh Schedule 299 QuickSight Data Editor 299 QuickSight Data Types 302 Change Data Types 302 Calculated Fields 303 Joining Data 305 Excluding Fields 309 Filtering Data 309 Removing Data 310 Geospatial Hierarchies and Adding Fields to Hierarchies 310 Unsupported Format Dates 311 Visualizing Data: QuickSight Analysis 312 Adding a Title and a Description to Your Analysis 313 Renaming the Sheet 314 Your First Visual with AutoGraph 314 Field Wells 314 Visuals Types 315 Saving and Autosaving 316 A First Example: Pie Chart 316 Renaming a Visual 317 Filtering Data 318 Adding Drill- Downs 320 Parameters 321 Actions 324 Insights 328 ML- Powered Insights 330 Sharing an Analysis 335 Dashboards 335 Dashboard Layouts and Themes 335 Publishing a Dashboard 336 Embedding Visuals and Dashboards 337 Data Consumption: Not Only Dashboards 337 Summary 338 Chapter 9 Machine Learning at Scale 339 Machine Learning and Artificial Intelligence 339 What Are ML/AI Use Cases? 340 Types of ML Models 340 Overview of ML/AI AWS Solutions 341 Amazon SageMaker 341 SageMaker Domains 342 Adding a User to the Domain 344 SageMaker Studio 344 SageMaker Example Notebook 346 Step 1: Prerequisites and Preprocessing 346 Step 2: Data Ingestion 347 Step 3: Data Inspection 348 Step 4: Data Conversion 349 Step 5: Upload Training Data 349 Step 6: Train the Model 349 Step 7: Set Up Hosting and Deploy the Model 351 Step 8: Validate the Model 352 Step 9: Use the Model 353 Inference 353 Real Time 354 Asynchronous 354 Serverless 354 Batch Transform 354 Data Wrangler 356 SageMaker Canvas 357 Summary 358 Appendix Example Data Architectures in AWS 359 Modern Data Lake Architecture 360 ETL in a Lake House 361 Consuming Data in the Lake House 361 The Modern Data Lake Architecture 362 Batch Processing 362 Stream Processing 363 Architecture Design Recommendations 364 Automate Everything 365 Build on Events 365 Performance = Cost Savings 365 AWS Glue Catalog and Athena- Centric Workflow 365 Design Flexible 365 Pick Your Battles 365 Parquet 366 Summary 366 Index 367

    15 in stock

    £35.62

  • A Practical Guide to Data Mining for Business and

    John Wiley & Sons Inc A Practical Guide to Data Mining for Business and

    15 in stock

    Book Synopsis* Presents data mining processes, methods and commonly used methods for descriptive and exploratory statistics using SAS and JMP.Trade Review“A Practical Guide to Data Mining for Business and Industrygives practical tools on how information can be extracted from masses of data. The book is very well written, in a conversational tone that makes it enjoyable to read. The authors are excellent communicators. If you are interested in learning about data mining, learning to do a particular task in data mining, looking for a textbook to use in a data mining or analytics course, or have a problem or data analytic task you are working on, this book would be an excellent place to start.” (Mathematical Association of America, 23 August 2014)Table of ContentsGlossary of terms xii Part I Data Mining Concept 1 1 Introduction 3 1.1 Aims of the Book 3 1.2 Data Mining Context 5 1.2.1 Domain Knowledge 6 1.2.2 Words to Remember 7 1.2.3 Associated Concepts 7 1.3 Global Appeal 8 1.4 Example Datasets Used in This Book 8 1.5 Recipe Structure 11 1.6 Further Reading and Resources 13 2 Data Mining Definition 14 2.1 Types of Data Mining Questions 15 2.1.1 Population and Sample 15 2.1.2 Data Preparation 16 2.1.3 Supervised and Unsupervised Methods 16 2.1.4 Knowledge-Discovery Techniques 18 2.2 Data Mining Process 19 2.3 Business Task: Clarification of the Business Question behind the Problem 20 2.4 Data: Provision and Processing of the Required Data 21 2.4.1 Fixing the Analysis Period 22 2.4.2 Basic Unit of Interest 23 2.4.3 Target Variables 24 2.4.4 Input Variables/Explanatory Variables 24 2.5 Modelling: Analysis of the Data 25 2.6 Evaluation and Validation during the Analysis Stage 25 2.7 Application of Data Mining Results and Learning from the Experience 28 Part II Data Mining Practicalities 31 3 All about data 33 3.1 Some Basics 34 3.1.1 Data, Information, Knowledge and Wisdom 35 3.1.2 Sources and Quality of Data 36 3.1.3 Measurement Level and Types of Data 37 3.1.4 Measures of Magnitude and Dispersion 39 3.1.5 Data Distributions 41 3.2 Data Partition: Random Samples for Training, Testing and Validation 41 3.3 Types of Business Information Systems 44 3.3.1 Operational Systems Supporting Business Processes 44 3.3.2 Analysis-Based Information Systems 45 3.3.3 Importance of Information 45 3.4 Data Warehouses 47 3.4.1 Topic Orientation 47 3.4.2 Logical Integration and Homogenisation 48 3.4.3 Reference Period 48 3.4.4 Low Volatility 48 3.4.5 Using the Data Warehouse 49 3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 50 3.5.1 Database Management System (DBMS) 51 3.5.2 Database (DB) 51 3.5.3 Database Communication Systems (DBCS) 51 3.6 Data Marts 52 3.6.1 Regularly Filled Data Marts 53 3.6.2 Comparison between Data Marts and Data Warehouses 53 3.7 A Typical Example from the Online Marketing Area 54 3.8 Unique Data Marts 54 3.8.1 Permanent Data Marts 54 3.8.2 Data Marts Resulting from Complex Analysis 56 3.9 Data Mart: Do’s and Don’ts 58 3.9.1 Do’s and Don’ts for Processes 58 3.9.2 Do’s and Don’ts for Handling 58 3.9.3 Do’s and Don’ts for Coding/Programming 59 4 Data Preparation 60 4.1 Necessity of Data Preparation 61 4.2 From Small and Long to Short and Wide 61 4.3 Transformation of Variables 65 4.4 Missing Data and Imputation Strategies 66 4.5 Outliers 69 4.6 Dealing with the Vagaries of Data 70 4.6.1 Distributions 70 4.6.2 Tests for Normality 70 4.6.3 Data with Totally Different Scales 70 4.7 Adjusting the Data Distributions 71 4.7.1 Standardisation and Normalisation 71 4.7.2 Ranking 71 4.7.3 Box–Cox Transformation 71 4.8 Binning 72 4.8.1 Bucket Method 73 4.8.2 Analytical Binning for Nominal Variables 73 4.8.3 Quantiles 73 4.8.4 Binning in Practice 74 4.9 Timing Considerations 77 4.10 Operational Issues 77 5 Analytics 78 5.1 Introduction 79 5.2 Basis of Statistical Tests 80 5.2.1 Hypothesis Tests and P Values 80 5.2.2 Tolerance Intervals 82 5.2.3 Standard Errors and Confidence Intervals 83 5.3 Sampling 83 5.3.1 Methods 83 5.3.2 Sample Sizes 84 5.3.3 Sample Quality and Stability 84 5.4 Basic Statistics for Pre-analytics 85 5.4.1 Frequencies 85 5.4.2 Comparative Tests 88 5.4.3 Cross Tabulation and Contingency Tables 89 5.4.4 Correlations 90 5.4.5 Association Measures for Nominal Variables 91 5.4.6 Examples of Output from Comparative and Cross Tabulation Tests 92 5.5 Feature Selection/Reduction of Variables 96 5.5.1 Feature Reduction Using Domain Knowledge 96 5.5.2 Feature Selection Using Chi-Square 97 5.5.3 Principal Components Analysis and Factor Analysis 97 5.5.4 Canonical Correlation, PLS and SEM 98 5.5.5 Decision Trees 98 5.5.6 Random Forests 98 5.6 Time Series Analysis 99 6 Methods 102 6.1 Methods Overview 104 6.2 Supervised Learning 105 6.2.1 Introduction and Process Steps 105 6.2.2 Business Task 105 6.2.3 Provision and Processing of the Required Data 106 6.2.4 Analysis of the Data 107 6.2.5 Evaluation and Validation of the Results (during the Analysis) 108 6.2.6 Application of the Results 108 6.3 Multiple Linear Regression for use when Target is Continuous 109 6.3.1 Rationale of Multiple Linear Regression Modelling 109 6.3.2 Regression Coefficients 110 6.3.3 Assessment of the Quality of the Model 111 6.3.4 Example of Linear Regression in Practice 113 6.4 Regression when the Target is not Continuous 119 6.4.1 Logistic Regression 119 6.4.2 Example of Logistic Regression in Practice 121 6.4.3 Discriminant Analysis 126 6.4.4 Log-Linear Models and Poisson Regression 128 6.5 Decision Trees 129 6.5.1 Overview 129 6.5.2 Selection Procedures of the Relevant Input Variables 134 6.5.3 Splitting Criteria 134 6.5.4 Number of Splits (Branches of the Tree) 135 6.5.5 Symmetry/Asymmetry 135 6.5.6 Pruning 135 6.6 Neural Networks 137 6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks 141 6.8 Unsupervised Learning 142 6.8.1 Introduction and Process Steps 142 6.8.2 Business Task 143 6.8.3 Provision and Processing of the Required Data 143 6.8.4 Analysis of the Data 145 6.8.5 Evaluation and Validation of the Results (during the Analysis) 147 6.8.6 Application of the Results 148 6.9 Cluster Analysis 148 6.9.1 Introduction 148 6.9.2 Hierarchical Cluster Analysis 149 6.9.3 K-Means Method of Cluster Analysis 150 6.9.4 Example of Cluster Analysis in Practice 151 6.10 Kohonen Networks and Self-Organising Maps 151 6.10.1 Description 151 6.10.2 Example of SOMs in Practice 152 6.11 Group Purchase Methods: Association and Sequence Analysis 155 6.11.1 Introduction 155 6.11.2 Analysis of the Data 157 6.11.3 Group Purchase Methods 158 6.11.4 Examples of Group Purchase Methods in Practice 158 7 Validation and Application 161 7.1 Introduction to Methods for Validation 161 7.2 Lift and Gain Charts 162 7.3 Model Stability 164 7.4 Sensitivity Analysis 167 7.5 Threshold Analytics and Confusion Matrix 169 7.6 ROC Curves 170 7.7 Cross-Validation and Robustness 171 7.8 Model Complexity 172 Part III Data Mining in Action 173 8 Marketing: Prediction 175 8.1 Recipe 1: Response Optimisation: to Find and Address the Right Number of Customers 176 8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer 186 8.3 Recipe 3: To Find the Right Number of Customers to Ignore 187 8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer 190 8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy 191 8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy 192 8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase 193 8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas 194 8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas 196 9 Intra-Customer Analysis 198 9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer 199 9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer 200 9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products 206 9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage 210 9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups 216 9.6 Recipe 15: Product Set Combination 217 9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer 219 10 Learning from a Small Testing Sample and Prediction 225 10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income) 225 10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases 236 10.3 Recipe 19: To Understand Operational Features and General Business Forecasting 241 11 Miscellaneous 244 11.1 Recipe 20: To Find Customers Who Will Potentially Churn 244 11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract 249 11.3 Recipe 22: Social Media Target Group Descriptions 250 11.4 Recipe 23: Web Monitoring 254 11.5 Recipe 24: To Predict Who is Likely to Click on a Special Banner 258 12 Software and Tools: A Quick Guide 261 12.1 List of Requirements When Choosing a Data Mining Tool 261 12.2 Introduction to the Idea of Fully Automated Modelling (FAM) 265 12.2.1 Predictive Behavioural Targeting 265 12.2.2 Fully Automatic Predictive Targeting and Modelling Real-Time Online Behaviour 266 12.3 FAM Function 266 12.4 FAM Architecture 267 12.5 FAM Data Flows and Databases 268 12.6 FAM Modelling Aspects 269 12.7 FAM Challenges and Critical Success Factors 270 12.8 FAM Summary 270 13 Overviews 271 13.1 To Make Use of Official Statistics 272 13.2 How to Use Simple Maths to Make an Impression 272 13.2.1 Approximations 272 13.2.2 Absolute and Relative Values 273 13.2.3 % Change 273 13.2.4 Values in Context 273 13.2.5 Confidence Intervals 274 13.2.6 Rounding 274 13.2.7 Tables 274 13.2.8 Figures 274 13.3 Differences between Statistical Analysis and Data Mining 275 13.3.1 Assumptions 275 13.3.2 Values Missing Because ‘Nothing Happened’ 275 13.3.3 Sample Sizes 276 13.3.4 Goodness-of-Fit Tests 276 13.3.5 Model Complexity 277 13.4 How to Use Data Mining in Different Industries 277 13.5 Future Views 283 Bibliography 285 Index 296

    15 in stock

    £53.06

  • DiskBased Algorithms for Big Data

    Taylor & Francis Ltd DiskBased Algorithms for Big Data

    1 in stock

    Book SynopsisDisk-Based Algorithms for Big Data is a product of recent advances in the areas of big data, data analytics, and the underlying file systems and data management algorithms used to support the storage and analysis of massive data collections. The book discusses hard disks and their impact on data management, since Hard Disk Drives continue to be common in large data clusters. It also explores ways to store and retrieve data though primary and secondary indices. This includes a review of different in-memory sorting and searching algorithms that build a foundation for more sophisticated on-disk approaches like mergesort, B-trees, and extendible hashing. Following this introduction, the book transitions to more recent topics, including advanced storage technologies like solid-state drives and holographic storage; peer-to-peer (P2P) communication; large file systems and query languages like Hadoop/HDFS, Hive, Cassandra, and Presto; and NoSQL databases like Neo4j for graph structurTable of ContentsForeword. Physical Disk Storage. File Management. Sorting. Searching. Disk-Based Sorting. Disk-Based Searching. Storage Technology. Large File Systems. NoSQL Storage. Appendix

    1 in stock

    £56.99

  • Data Quality

    John Wiley & Sons Inc Data Quality

    15 in stock

    Book SynopsisDiscover how to achieve business goals by relying on high-quality, robust data In Data Quality: Empowering Businesses with Analytics and AI, veteran data and analytics professional delivers a practical and hands-on discussion on how to accelerate business results using high-quality data. In the book, you'll learn techniques to define and assess data quality, discover how to ensure that your firm's data collection practices avoid common pitfalls and deficiencies, improve the level of data quality in the business, and guarantee that the resulting data is useful for powering high-level analytics and AI applications. The author shows you how to: Profile for data quality, including the appropriate techniques, criteria, and KPIs Identify the root causes of data quality issues in the business apart from discussing the 16 common root causes that degrade data quality in the organization. Formulate the reference architecture for data quality, inTable of ContentsForeword by Bill Inmon Preface About the Book Quality Principles Applied in This Book Organization of the Book Who Should Read This Book? References Acknowledgments Define Phase Chapter 1: Introduction Introduction Data, Analytics, AI, and Business Performance Data as a Business Asset or Liability Data Governance, Data Management, and Data Quality Leadership Commitment to Data Quality Key Takeaways Conclusion References Chapter 2: Business Data Introduction Data in Business Telemetry Data Purpose of Data in Business Business Data Views Key Characteristics of Business Data Critical Data Elements (CDE) Key Takeaways Conclusion References Chapter 3: Data Quality in Business Introduction Data Quality Dimensions Context in Data Quality Consequences and Costs of Poor Data Quality Data Depreciation and Its Factors Data in IT Systems Data Quality and Trusted Information Key Takeaways Conclusion References Analyze Phase Chapter 4: Causes for Poor Data Quality Introduction Data Quality RCA Techniques Typical Causes of Poor Data Quality Key Takeaways Conclusion References Chapter 5: Data Lifecycle and Lineage Introduction Business-Enabled DLC Stages IT Business-Enabled DLC Stages Data Lineage Key Takeaways Conclusion References Chapter 6: Profiling for Data Quality Introduction Criteria for Data Profiling Data Profiling Techniques for Measures of Centrality Data Profiling Techniques for Measures of Variation Integrating Centrality and Variation KPIs Key Takeaways Conclusion References Realize Phase Chapter 7: Reference Architecture for Data Quality Introduction Options to Remediate Data Quality DataOps Data Product Data Fabric and Data Mesh Data Enrichment Key Takeaways Conclusion References Chapter 8: Best Practices to Realize Data Quality Introduction Overview of Best Practices BP 1: Identify the Business KPIs and the Ownership of These KPIs and the Pertinent Data BP 2: Build and Improve the Data Culture and Literacy in the Organization BP 3: Define the Current and Desired state of Data Quality BP 4: Follow the Minimalistic Approach to Data Capture BP 5: Select and Define the Data Attributes for Data Quality BP 6: Capture and Manage Critical Data with Data Standards in MDM Systems Key Takeaways Conclusion References Chapter 9: Best Practices to Realize Data Quality Introduction BP 7: Automate the Integration of Critical Data Elements BP 8: Define the SoR and Securely Capture Transactional Data in the SoR/OLTP System BP 9: Build and Manage Robust Data Integration Capabilities BP 10: Distribute Data Sourcing and Insight Consumption Key Takeaways Conclusion References Sustain Phase Chapter 10: Data Governance Introduction Data Governance Principles Data Governance Design Components Implementing the Data Governance Program Data Observability Data Compliance – ISO 27001 and SOC2 Key Takeaways Conclusion References Chapter 11: Protecting Data Introduction Data Classification Data Safety Data Security Key Takeaways Conclusion References Chapter 12: Data Ethics Introduction Data Ethics Importance of Data Ethics Principles of Data Ethics Model Drift in Data Ethics Data Privacy Managing Data Ethically Key Takeaways Conclusion References Appendix 1: Abbreviations and Acronyms Appendix 2: Glossary Appendix 3: Data Literacy Competencies About the Author Index

    15 in stock

    £24.79

  • Data Science Essentials For Dummies

    John Wiley & Sons Data Science Essentials For Dummies

    15 in stock

    Book Synopsis

    15 in stock

    £11.69

  • Big Data on Campus

    Johns Hopkins University Press Big Data on Campus

    1 in stock

    Book SynopsisHow data-informed decision making can make colleges and universities more effective institutions. The continuing importance of data analytics is not lost on higher education leaders, who face a multitude of challenges, including increasing operating costs, dwindling state support, limits to tuition increases, and increased competition from the for-profit sector. To navigate these challenges, savvy leaders must leverage data to make sound decisions. In Big Data on Campus, leading data analytics experts and higher ed leaders show the role that analytics can play in the better administration of colleges and universities. Aimed at senior administrative leaders, practitioners of institutional research, technology professionals, and graduate students in higher education, the book opens with a conceptual discussion of the roles that data analytics can play in higher education administration. Subsequent chapters address recent developments in technology, the rapid accumulation of data assetsTable of ContentsForeword, by Christine M. KellerAcknowledgments Part I. Technology, Digitization, Big Data, and Analytics Maturity as the Enabling Conditions for Data-Informed Decision MakingChapter 1. Data Analytics and the Imperatives for Data-Informed Decision Making in Higher Education Karen L. Webber and Henry Y. ZhengChapter 2. Big Data and the Transformation of Decision Making in Higher Education Braden J. HoschChapter 3. Predictive Analytics and Its Uses in Higher Education Henry Y. Zheng and Ying ZhouPart II. The Ethical, Cultural, and Managerial Imperatives of Data-Informed Decision Making in Higher EducationChapter 4. Limitations in Data Analytics: Potential Misuse and Misunderstanding in Data Reports and Visualizations Karen L. Webber and Jillian N. MornChapter 5. Guiding Your Organization's Data Strategy: The Roles of University Senior Leaders and Trustees in Strategic Analytics Gail B. Marsh and Rachit TharianiChapter 6. Data Governance, Data Stewardship, and the Building of an AnalyticsOrganizational Culture Rana Glasgal and Valentina NestorPart III. The Application of Analytics in Higher Education Decision Making: Case StudiesChapter 7. Data Analytics and Decision Making in Admissions and Enrollment Management Tom Gutman and Brian P. HinoteChapter 8. Predictive Analytics, Academic Advising, Early Alerts, and Student Success Timothy M. RenickChapter 9. Constituent Relationship Management and Student Engagement Lifecycle Cathy A. O'Bryan, Chris Tompkins, and Carrie Hancock MarcinkevageChapter 10. Learning Analytics for Learning Assessment: Complexities in Efficacy, Implementation, and Broad Use Carrie Klein, Jaime Lester, Huzefa Rangwala, and Aditya JohriChapter 11. Using Data Analytics to Support Institutional Financial and Operational Efficiency Lindsay K. Wayt, Susan M. Menditto, J. Michael Gower, and Charles TegenPart IV. Concluding CommentsChapter 12. Data-Informed Decision Making and the Pursuit of Analytics Maturity in Higher Education Karen L. Webber and Henry Y. ZhengContributorsIndex

    1 in stock

    £33.25

  • How Colleges Use Data

    Johns Hopkins University Press How Colleges Use Data

    7 in stock

    Book SynopsisWhat does a culture of evidence really look like in higher education?The use of big data and the rapid acceleration of storage and analytics tools have led to a revolution of data use in higher education. Institutions have moved from relying largely on historical trends and descriptive data to the more widespread adoption of predictive and prescriptive analytics. Despite this rapid evolution of data technology and analytics tools, universities and colleges still face a number of obstacles in their data use. In How Colleges Use Data, Jonathan S. Gagliardi presents college and university leaders with an important resource to help cultivate, implement, and sustain a culture of evidence through the ethical and responsible use and adoption of data and analytics. Gagliardi provides a broad context for data use among colleges, including key concepts and use cases related to data and analytics. He also addresses the different dimensions of data use and highlights the promise and perils of the Table of ContentsPrefaceAcknowledgmentsChapter 1. The Evidence ImperativeChapter 2. Demystifying Data and AnalyticsChapter 3. Defining an Institutional Aspiration Using DataChapter 4. Equity and Student SuccessChapter 5. Strategic Finance and Resource OptimizationChapter 6. Academic Quality and RenewalChapter 7. Creating a Data Governance SystemChapter 8. The Promise and Peril of Data and AnalyticsChapter 9. Implementation and PlanningChapter 10. Looking AheadNotesIndex

    7 in stock

    £21.60

  • Because Data Cant Speak for Itself

    Johns Hopkins University Press Because Data Cant Speak for Itself

    7 in stock

    Book Synopsis

    7 in stock

    £18.05

  • Pro Data Backup and Recovery Experts Voice in Data Management

    Apress Pro Data Backup and Recovery Experts Voice in Data Management

    15 in stock

    Table of Contents Introduction to Backup and Recovery Backup Software Physical Backup Media Virtual Backup Media New Media Technologies Software Architectures: CommVault Software Architectures: NetBackup Application Backup Strategies Putting It All Together: Sample Backup Environments Monitoring and Reporting Summary

    15 in stock

    £47.49

  • GraphBased Clustering and Data Visualization Algorithms

    Springer GraphBased Clustering and Data Visualization Algorithms

    15 in stock

    Book SynopsisThis work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space.Table of ContentsVector Quantisation and Topology-Based Graph RepresentationGraph-Based Clustering AlgorithmsGraph-Based Visualisation of High-Dimensional Data

    15 in stock

    £52.24

  • 21 Recipes for Mining Twitter

    O'Reilly Media 21 Recipes for Mining Twitter

    1 in stock

    Book SynopsisMillions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools.

    1 in stock

    £19.19

  • Data Mining

    O'Reilly Media Data Mining

    1 in stock

    Book SynopsisThis non-technical guide shows you how to extract significant business value from big data with Ask-Measure-Learn, a system that helps you ask the right questions, measure the right data, and then learn from the results.

    1 in stock

    £19.19

  • Beautiful Visualization  Looking At Data Through

    O'Reilly Media Beautiful Visualization Looking At Data Through

    2 in stock

    Book SynopsisWith contributions from more than two dozen experts, this book demonstrates why visualizations are beautiful not only for their aesthetic design, but also for elegant layers of detail that efficiently generate insight and new understanding.

    2 in stock

    £35.99

  • HBase

    O'Reilly Media HBase

    1 in stock

    Book SynopsisIf your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs.

    1 in stock

    £23.99

  • Encyclopedia of Database Systems

    Springer-Verlag New York Inc. Encyclopedia of Database Systems

    1 in stock

    Book Synopsis.NET Remoting.- Absolute Time.- Abstract Versus Concrete Temporal Query Languages.- Abstraction.- Access Control.- Access Control Administration Policies.- Access Control Policy Languages.- Access Path.- ACID Properties.- Active and Real-Time Data Warehousing.- Active Database Coupling Modes.- Active Database Execution Model.- Active Database Knowledge Model.- Active Database Management System Architecture.- Active Database Rulebase.- Active Database, Active Database (Management) System.- Active Storage.- Active XML.- Activity.- Activity Diagrams.- Actors/Agents/Roles.- Adaptive Interfaces.- Adaptive Middleware for Message Queuing Systems.- Adaptive Query Processing.- Adaptive Stream Processing.- ADBMS.- Administration Model for RBAC.- Administration Wizards.- Advanced Information Retrieval Measures.- Aggregation: Expressiveness and Containment.- Aggregation-Based Structured Text Retrieval.- Air Indexes for Spatial Databases.- AJAX.- Allen's Relations.- AMOSQL.- AMS Sketch.- Anchor TexTable of Contents.NET Remoting.- Absolute Time.- Abstract Versus Concrete Temporal Query Languages.- Abstraction.- Access Control.- Access Control Administration Policies.- Access Control Policy Languages.- Access Path.- ACID Properties.- Active and Real-Time Data Warehousing.- Active Database Coupling Modes.- Active Database Execution Model.- Active Database Knowledge Model.- Active Database Management System Architecture.- Active Database Rulebase.- Active Database, Active Database (Management) System.- Active Storage.- Active XML.- Activity.- Activity Diagrams.- Actors/Agents/Roles.- Adaptive Interfaces.- Adaptive Middleware for Message Queuing Systems.- Adaptive Query Processing.- Adaptive Stream Processing.- ADBMS.- Administration Model for RBAC.- Administration Wizards.- Advanced Information Retrieval Measures.- Aggregation: Expressiveness and Containment.- Aggregation-Based Structured Text Retrieval.- Air Indexes for Spatial Databases.- AJAX.- Allen's Relations.- AMOSQL.- AMS Sketch.- Anchor Text.- Annotation.- Annotation-based Image Retrieval.- Anomaly Detection on Streams.- Anonymity.- ANSI/INCITS RBAC Standard.- Answering Queries Using Views.- Anti-monotone Constraints.- Applicability Period.- Application Benchmark.- Application Recovery.- Application Server.- Application-Level Tuning.- Applications of Emerging Patterns for Microarray Gene Expression Data Analysis.- Applications of Sensor Network Data Management.- Approximate Queries in Peer-to-Peer Systems.- Approximate Query Processing.- Approximate Reasoning.- Approximation of Frequent Itemsets.- Apriori Property and Breadth-First Search Algorithms.- Architecture-Conscious Database System.- Archiving Experimental Data.- Armstrong Axioms.- Array Databases.- Array Databases_old.- Association Rule Mining on Streams.- Association Rules.- Asymmetric Encryption.- Atelic Data.- Atomic Event.- Atomicity.- Audio.- Audio Classification.- Audio Content Analysis.- Audio Metadata.- Audio Representation.- Audio Segmentation.- Auditing and Forensic Analysis.- Authentication.- Automatic Image Annotation.- Autonomous Replication.- Average Precision.- Average Precision at n.- Average Precision Histogram.- Average R-Precision.- B+-Tree.- Backup and Restore.- Bag Semantics.- Bagging.- Bayesian Classification.- Benchmark Frameworks.- Benchmarks for Big Data Analytics.- Big Data Platforms for Data Analytics.- Big Stream Systems.- Biological Metadata Management.- Biological Networks.- Biological Resource Discovery.- Biological Sequences.- Biomedical Data/Content Acquisition, Curation.- Biomedical Image Data Types and Processing.- Biomedical Scientific Textual Data Types and Processing.- Biostatistics and Data Analysis.- Bi-Temporal Indexing.- Bitemporal Interval.- Bitemporal Relation.- Bitmap Index.- Bitmap-based Index Structures.- Blind Signatures.- Bloom Filters.- BM25.- Boolean Model.- Boosting.- Bootstrap.- Boyce-Codd Normal Form.- BP-Completeness.- Bpref.- Browsing.- Browsing in Digital Libraries.- B-Tree Locking.- Buffer Management.- Buffer Manager.- Buffer Pool.- Business Intelligence.- Business Process Execution Language.- Business Process Management.- Business Process Modeling Notation.- Business Process Reengineering.- Cache-Conscious Query Processing.- Calendar.- Calendric System.- CAP Theorem.- Cardinal Direction Relationships.- Cartesian Product.- Cataloging in Digital Libraries.- Causal Consistency.- Certain (and Possible) Answers.- Change Detection on Streams.- Channel-Based Publish/Subscribe.- Chart.- Chase.- Checksum and Cyclic Redundancy Check Mechanism.- Choreography.- Chronon.- Citation.- Classification.- Classification by Association Rule Analysis.- Classification in Streams.- Client-Server Architecture.- Clinical Data Acquisition, Storage and Management.- Clinical Data and Information Models.- Clinical Data Quality and Validation.- Clinical Decision Support.- Clinical Document Architecture.- Clinical Event.- Clinical Knowledge Repository.- Clinical Observation.- Clinical Ontologies.- Clinical Order.- Closed Itemset Mining and Non-redundant Association Rule Mining.- Closest-Pair Query.- Cloud Computing.- Cloud Intelligence.- Cluster and Distance Measure.- Clustering for Post Hoc Information Retrieval.- Clustering on Streams.- Clustering Overview and Applications.- Clustering Validity.- Clustering with Constraints.- Collaborative Filtering.- Column Segmentation.- Column Stores.- Common Warehouse Metamodel.- Comparative Visualization.- Compensating Transactions.- Complex Event.- Complex Event Processing.- Composed Services and WS-BPEL.- Composite Event.- Composition.- Comprehensions.- Compression of Mobile Location Data.- Computational Media Aesthetics.- Computationally Complete Relational Query Languages.- Computerized Physician Order Entry.- Conceptual Modeling Foundations.- Conceptual Schema Design.- Concurrency Control - Traditional Approaches.- Concurrency Control for Replicated Databases.- Concurrency Control Manager.- Conditional Tables.- Conjunctive Query.- Connection.- Consistency Models For Replicated Data.- Consistent Query Answering.- Constraint Databases.- Constraint Query Languages.- Constraint-Driven Database Repair.- Content-and-Structure Query.- Content-Based Publish/Subscribe.- Content-Based Video Retrieval.- Content-Only Query.- Context.- Contextualization in Structured Text Retrieval.- Continuous Data Protection.- Continuous Monitoring of Spatial Queries.- Continuous Multimedia Data Retrieval.- Continuous Queries in Sensor Networks.- Continuous Query.- ConTract.- Control Data.- Convertible Constraints.- Coordination.- Copyright Issues in Databases.- CORBA.- Correctness Criteria Beyond Serializability.- Cost and quality trade-offs in crowdsourcing.- Cost Estimation.- Count-Min Sketch.- Coupling and De-coupling.- Covering Index.- Crash Recovery.- Cross-Language Mining and Retrieval.- Cross-Modal Multimedia Information Retrieval.- Cross-Validation.- Crowd Database Operators.- Crowd Database Systems.- Crowd Mining and Analysis.- Crowdsourcing Geographic Information Systems.- Cube.- Cube Implementations.- Current Semantics.- Curse of Dimensionality.- Daplex.- Data Acquisition and Dissemination in Sensor Networks.- Data Aggregation in Sensor Networks.- Data Broadcasting, Caching and Replication in Mobile Computing.- Data Cleaning.- Data Compression in Sensor Networks.- Data Conflicts.- Data Definition.- Data Definition Language (DDL).- Data Dictionary.- Data Encryption.- Data Estimation in Sensor Networks.- Data Exchange.- Data Fusion.- Data Fusion in Sensor Networks.- Data Generation.- Data Governance.- Data Integration Architectures and Methodology for the Life Sciences.- Data Integration in Web Data Extraction System.- Data Management for VANETs.- Data Management Fundamentals: Database Management System.- Data Management in Data Centers.- Data Manipulation.- Data Manipulation Language (DML).- Data Mart.- Data Migration Management.- Data Mining.- Data Partitioning.- Data Privacy and Patient Consent.- Data Profiling.- Data Provenance.- Data Quality Assessment.- Data Quality Dimensions.- Data Quality Models.- Data Rank/Swapping.- Data Reduction.- Data Replication.- Data Sampling.- Data Scrubbing.- Data Sketch/Synopsis.- Data Skew.- Data Storage and Indexing in Sensor Networks.- Data Stream.- Data Stream Management Architectures and Prototypes.- Data Types in Scientific Data Management.- Data Uncertainty Management in Sensor Networks.- Data Visualization.- Data Warehouse.- Data Warehouse Life-Cycle and Design.- Data Warehouse Maintenance, Evolution and Versioning.- Data Warehouse Metadata.- Data Warehouse Security.- Data Warehousing for Clinical Research.- Data Warehousing in Cloud Environments.- Data Warehousing on Non-Conventional Data.- Data Warehousing Systems: Foundations and Architectures.- Data, Text, and Web Mining in Healthcare.- Database.- Database Adapter and Connector.- Database Administrator (DBA).- Database Appliances.- Database Benchmarks.- Database Clustering Methods.- Database Clusters.- Database Dependencies.- Database Design.- Database Languages for Sensor Networks.- Database Machine.- Database Management System.- Database Middleware.- Database Repair.- Database Reverse Engineering.- Database Schema.- Database Security.- Database System.- Database Techniques to Improve Scientific Simulations.- Database Trigger.- Database Tuning using Combinatorial Search.- Database Tuning using Online Algorithms.- Database Tuning using Trade-off Elimination.- Database Use in Science Applications.- Datalog.- DBMS Component.- DBMS Interface.- DCE.- DCOM.- Decay Models.- Decision Rule Mining in Rough Set Theory.- Decision Tree Classification.- Decision Trees.- Declarative Networking.- Deductive Data Mining using Granular Computing.- Deduplication.- Deduplication in Data Cleaning.- Deep Instantiation.- Deep-Web Search.- Dense Index.- Dense Pixel Displays.- Density-based Clustering.- Description Logics.- Design for Data Quality.- Dewey Decimal System.- Diagram.- Difference.- Differential Privacy.- Digital Archives and Preservation.- Digital Curation.- Digital Elevation Models.- Digital Libraries.- Digital Rights Management.- Digital Signatures.- Dimension.- Dimension Reduction Techniques for Clustering.- Dimensionality Reduction.- Dimensionality Reduction Techniques For Nearest Neighbor Computations.- Dimension-Extended Topological Relationships.- Direct Attached Storage.- Direct Manipulation.- Disaster Recovery.- Disclosure Risk.- Discounted Cumulated Gain.- Discovery.- Discrete Wavelet Transform and Wavelet Synopses.- Discretionary Access Control.- Disk.- Disk Power Saving.- Distortion Techniques.- Distributed Architecture.- Distributed Concurrency Control.- Distributed Data Streams.- Distributed Database Design.- Distributed Database Systems.- Distributed DBMS.- Distributed Deadlock Management.- Distributed File Systems.- Distributed Hash Table.- Distributed Join.- Distributed Machine Learning.- Distributed Query Optimization.- Distributed Query Processing.- Distributed Recovery.- Distributed Spatial Databases.- Distributed Transaction Management.- Divergence from Randomness Models.- D-measure.- Document.- Document Clustering.- Document Databases.- Document Field.- Document Length Normalization.- Document Links and Hyperlinks.- Document Representations (Inclusive Native and Relational).- Dublin Core.- Dynamic Graphics.- Dynamic Web Pages.- eAccessibility.- ECA Rule Action.- ECA Rule Condition.- ECA Rules.- e-Commerce Transactions.- Effectiveness Involving Multiple Queries.- Ehrenfeucht-Fraïssé Games.- Elasticity.- Electronic Dictionary.- Electronic Encyclopedia.- Electronic Health Record.- Electronic Ink Indexing.- Electronic Newspapers.- Eleven Point Precision-recall Curve.- Emergent Semantics.- Emerging Pattern Based Classification.- Emerging Patterns.- Energy Efficiency in Data Centers.- Ensemble.- Enterprise Application Integration.- Enterprise Content Management.- Enterprise Service Bus.- Enterprise Terminology Services.- Entity Relationship Model.- Entity Resolution.- Entity Retrieval.- Equality-Generating Dependencies.- ERR- Expected Reciprocal Rank.- ERR-IA Intent-aware ERR.- Escrow Transactions.- European Law in Databases.- Evaluation Metrics for Structured Text Retrieval.- Evaluation of Relational Operators.- Event.- Event and Pattern Detection over Streams.- Event Causality.- Event Channel.- Event Cloud.- Event Detection.- Event Driven Architecture.- Event Flow.- Event in Active Databases.- Event in Temporal Databases.- Event Lineage.- Event Pattern Detection.- Event Prediction.- Event Processing Agent.- Event Processing Network.- Event Sink.- Event Source.- Event Specification.- Event Stream.- Event Transformation.- Event-Driven Business Process Management.- Eventual Consistency.- Evidence Based Medicine.- Executable Knowledge.- Execution Skew.- Explicit Event.- Exploratory Data Analysis.- Expressive Power of Query Languages.- Extended Entity-Relationship Model.- Extended Transaction Models and the ACTA Framework.- Extendible Hashing.- Extraction, Transformation, and Loading.- Faceted Search.- Fault-Tolerance and High Availability in Data Stream Management Systems.- Feature Extraction for Content-Based Image Retrieval.- Feature Selection for Clustering.- Feature-Based 3D Object Retrieval.- Field-Based Information Retrieval Models.- Field-Based Spatial Modeling.- First-Order Logic: Semantics.- First-Order Logic: Syntax.- Fixed Time Span.- Flex Transactions.- FM Synopsis.- F-Measure.- Focused Web Crawling.- FOL Modeling of Integrity Constraints (Dependencies).- Forever.- Form.- Fourth Normal Form.- FQL.- Fractal.- Frequency Moments.- Frequent Graph Patterns.- Frequent Items on Streams.- Frequent Itemset Mining with Constraints.- Frequent Itemsets and Association Rules.- Frequent Partial Orders.- Fully-Automatic Web Data Extraction.- Functional Data Model.- Functional Dependencies for Semi-Structured Data.- Functional Dependency.- Functional Query Language.- Fuzzy Models.- Fuzzy Relation.- Fuzzy Set.- Fuzzy Set Approach.- Fuzzy/Linguistic IF-THEN Rules and Linguistic Descriptions.- Gazetteers.- Gene Expression Arrays.- Generalization of ACID Properties.- Generalized Search Tree.- Genetic Algorithms.- Geographic Information System.- Geographical Information Retrieval.- Geography Markup Language.- Geometric Stream Mining.- GEO-RBAC Model.- Georeferencing.- Geosocial Networks.- Geospatial Metadata.- Geo-Targeted Web Search.- GMAP.- Grammar Inference.- Graph.- Graph Data Management in Scientific Applications.- Graph Database.- Graph Management in the Life Sciences.- Graph Mining.- Graph Mining on Streams.- Graph OLAP.- Graphical Models for Uncertain Data Management.- Grid and Workflows.- Grid File (and Family).- GUIs for Web Data Extraction.- Hash Functions.- Hash Join.- Hash-based Indexing.- Healthcare Metrics.- Hierarchial Clustering.- Hierarchical Data Model.- Hierarchical Data Summarization.- Hierarchical Heavy Hitter Mining on Streams.- Hierarchy.- High Dimensional Indexing.- Histogram.- Histograms on Streams.- History in Temporal Databases.- Homomorphic Encryption.- Horizontally Partitioned Data.- Human Factors Modeling in Crowdsourcing.- Human-centered Computing: Application to Multimedia.- Human-Computer Interaction.- Hypertexts.- I/O Model of Computation.- Icon.- Iconic Displays.- Image.- Image Content Modeling.- Image Database.- Image Management for Biological Data.- Image Metadata.- Image Querying.- Image Representation.- Image Retrieval and Relevance Feedback.- Image Segmentation.- Image Similarity.- Implementation of Database Operators (Joins, Group by, etc.).- Implication of Constraints.- Implications of Genomics for Clinical Informatics.- Implicit Event.- Incomplete Information.- Inconsistent Databases.- Incremental Computation of Queries.- Incremental Crawling.- Incremental Maintenance of Views with Aggregates.- Index Creation and File Structures.- Index Join.- Index Structures for Biological Sequences.- Index Tuning.- Indexed Sequential Access Method.- Indexing and Similarity Search.- Indexing Compressed Text.- Indexing Historical Spatio-Temporal Data.- Indexing in pub/sub systems.- Indexing Metric Spaces.- Indexing of Data Warehouses.- Indexing of the Current and Near-Future Positions of Moving Objects.- Indexing Techniques for Multimedia Data Retrieval.- Indexing the Web.- Indexing Uncertain Data.- Indexing Units of Structured Text Retrieval.- Indexing with Crowds.- Individually Identifiable Data.- Inference Control in Statistical Databases.- Information Extraction.- Information Filtering.- Information Foraging.- Information Integration.- Information Integration Techniques for Scientific Data.- Information Lifecycle Management.- Information Loss Measures.- Information Navigation.- Information Quality.- Information Quality and Decision Making.- Information Quality Assessment.- Information Quality Policy and Strategy.- Information Quality: Managing Information as a Product.- Information Retrieval.- Information Retrieval Models.- Information Retrieval Operations.- Infrastructure As-A-Service (IaaS).- Initiative for the Evaluation of XML Retrieval.- Initiator.- In-Network Query Processing.- Integrated DB and IR Approaches.- Integration of Rules and Ontologies.- Intelligent Storage Systems.- Interactive Analytics in Social Media.- Interface.- Interface Engines in Healthcare.- Interoperability in Data Warehouses.- Interoperation of NLP-based Systems with Clinical Databases.- Inter-Operator Parallelism.- Inter-Query Parallelism.- Intra-operator Parallelism.- Intra-Query Parallelism.- Intrusion Detection Technology.- Inverse Document Frequency.- Inverted Files.- IP Storage.- Iterator.- Java Database Connectivity.- Java Enterprise Edition.- Java Metadata Facility.- Join.- Join Dependency.- Join Index.- Join Order.- k-Anonymity.- Karp-Luby Sampling.- KDD Pipeline.- Key.- K-Means and K-Medoids.- Knowledge Base.- Knowledge Base Extraction.- Language Models.- Languages for Web Data Extraction.- Learning Distance Measures.- Lexical Analysis of Textual Data.- Licensing and Contracting Issues in Databases.- Lifespan.- Lightweight Ontologies.- Linear Hashing.- Linear Regression.- Linked Open Data.- Linking and Brushing.- Load Balancing in Peer-to-Peer Overlay Networks.- Load Shedding.- LOC METS.- Locality.- Locality of Queries.- Location Based Recommendation.- Location Management in Mobile Environments.- Location Update Management.- Location-Based Services.- Locking Granularity and Lock Types.- Logging and Recovery.- Logging/Recovery Subsystem.- Logical and Physical Data Independence.- Logical Database Design: from Conceptual to Logical Schema.- Logical Document Structure.- Logical Foundations of Web Data Extraction.- Logical Models of Information Retrieval.- Logical Unit Number.- Logical Unit Number Mapping.- Logical Volume Manager.- Log-Linear Regression.- Loop.- Loose Coupling.- Machine Learning in Computational Biology.- Main Memory.- Main Memory DBMS.- Maintenance of Materialized Views with Outer-Joins.- Maintenance of Recursive Views.- Managing Compressed Structured Text.- Managing Data Integration Uncertainty.- Managing Probabilistic Entity Extraction.- Mandatory Access Control.- MANET Databases.- MAP.- Map Matching.- MapReduce.- Markup Language.- MashUp.- Massive Array of Idle Disks.- Matrix Masking.- Max-Pattern Mining.- Mean Reciprocal Rank.- Measure.- Mediation.- Membership Query.- Memory Hierarchy.- Memory Locality.- Merkle Trees.- Message Authentication Codes.- Message Queuing Systems.- Meta Data Repository.- Meta Object Facility.- Metadata.- Metadata Interchange Specification.- Metadata Registry, ISO/IEC 11179.- Metamodel.- Metasearch Engines.- Metric Space.- Microaggregation.- Microbenchmark.- Microdata.- Microdata Rounding.- Middleware Support for Database Replication and Caching.- Middleware Support for Precise Failure Semantics.- Mining of Chemical Data.- Mobile Database.- Mobile Interfaces.- Mobile resource search.- Mobile Sensor Network Data Management.- Model Management.- Model-based Querying in Sensor Networks.- Monotone Constraints.- Monte Carlo Methods for Uncertain Data.- Moving Object.- Moving Objects Databases and Tracking.- MRR.- Multi-Data Center Consistency Properties.- Multi-Data Center Replication Protocols.- Multidimensional Data Formats.- Multidimensional Modeling.- Multidimensional Scaling.- Multi-Level Modeling.- Multi-Level Recovery and the ARIES Algorithm.- Multilevel Secure Database Management System.- Multilevel Transactions and Object-Model Transactions.- Multimedia Data.- Multimedia Data Buffering.- Multimedia Data Indexing.- Multimedia Data Querying.- Multimedia Data Storage.- Multimedia Databases.- Multimedia Information Retrieval Model.- Multimedia Metadata.- Multimedia Presentation Databases.- Multimedia Resource Scheduling.- Multimedia Retrieval Evaluation.- Multimedia Tagging.- Multimodal Interfaces.- Multi-Pathing.- Multiple Representation Modeling.- Multi-Query Optimization.- Multi-Resolution Terrain Modeling.- Multi-Step Query Processing.- Multitenancy.- Multi-Tier Architecture.- Multi-tier Storage Systems.- Multivalued Dependency.- Multivariate Visualization Methods.- Multi-version Serializability and Concurrency Control.- Naive Tables.- Narrowed Extended XPath I.- Natural Interaction.- Near-duplicate Retrieval.- Nearest Neighbor Classification.- Nearest Neighbor Query.- Nearest Neighbor Query in Spatio-temporal Databases.- Nested Loop Join.- Nested Transaction Models.- Network Attached Secure Device.- Network Attached Storage.- Network Data Model.- Neural Networks.- N-Gram Models.- Noise Addition.- Nonparametric Data Reduction Techniques.- Non-Perturbative Masking Methods.- Non-relational Streams.- Nonsequenced Semantics.- Normal Form ORA-SS Schema Diagrams.- Normal Forms and Normalization.- NoSQL Stores.- Now in Temporal Databases.- Null Values.- OASIS.- Object Constraint Language.- Object Data Models.- Object Identity.- Object Recognition.- Object Relationship Attribute Data Model for Semi-structured Data.- Object Storage Protocol.- Object-Role Modeling.- OLAM.- OLAP Personalization and Recommendation.- OLAP Personalization and Recommendation_old.- One-Copy-Serializability.- One-Pass Algorithm.- On-Line Analytical Processing.- Online Recovery in Parallel Database Systems.- Ontologies and Life Science Data Management.- Ontology.- Ontology Elicitation.- Ontology Engineering.- Ontology Visual Querying.- Ontology-Based Data Access and Integration.- Open Database Connectivity.- Open Information Extraction.- Open Nested Transaction Models.- Operator-Level Parallelism.- Opinion Mining.- Optimistic Replication and Resolution.- Optimization and Tuning in Data Warehouses.- OQL.- Orchestration.- Order Dependency.- OR-Join.- OR-Split.- OSQL.- Outlier Detection.- Overlay Network.- OWL: Web Ontology Language.- P/FDM.- Parallel and Distributed Data Warehouses.- Parallel Coordinates.- Parallel Data Placement.- Parallel Database Management.- Parallel Hash Join, Parallel Merge Join, Parallel Nested Loops Join.- Parallel Query Execution Algorithms.- Parallel Query Optimization.- Parallel Query Processing.- Parameterized Complexity of Queries.- Parametric Data Reduction Techniques.- Partial Replication.- Path Query.- Pattern-Growth Methods.- Peer Data Management System.- Peer to Peer Overlay Networks: Structure, Routing and Maintenance.- Peer-To-Peer Content Distribution.- Peer-to-Peer Data Integration.- Peer-to-Peer Publish-Subscribe Systems.- Peer-to-Peer Storage.- Peer-to-Peer System.- Peer-to-Peer Web Search.- Performance Analysis of Transaction Processing Systems.- Performance Monitoring Tools.- Period-Stamped Temporal Models.- Personalized Web Search.- Petri Nets.- Physical Clock.- Physical Database Design for Relational Databases.- Physical Layer Tuning.- Pipeline.- Pipelining.- Platform As-A-Service (PaaS).- Point-in-Time Copy.- Point-Stamped Temporal Models.- Polytransactions.- Positive Relational Algebra.- Possible Answers.- PRAM.- Precision.- Precision and Recall.- Precision at n.- Precision-Oriented Effectiveness Measures.- Predictive Analytics.- Preference Queries.- Preference Specification.- Prescriptive Analytics.- Presenting Structured Text Retrieval Results.- Primary Index.- Principal Component Analysis.- Privacy.- Privacy Metrics.- Privacy Policies and Preferences.- Privacy through Accountability.- Privacy-Enhancing Technologies.- Privacy-Preserving Data Mining.- Privacy-Preserving DBMSs.- Private Information Retrieval.- Probabilistic Databases.- Probabilistic Entity Resolution.- Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model.- Probabilistic Skylines.- Probabilistic Spatial Queries.- Probabilistic Temporal Databases.- Probability Ranking Principle.- Probability Smoothing.- Process Life Cycle.- Process Mining.- Process Modeling.- Process Optimization.- Process Structure of a DBMS.- Processing Overlaps in Structured Text Retrieval.- Processing Structural Constraints.- Processor Cache.- Profiles and Context for Structured Text Retrieval.- Projection.- Propagation-based Structured Text Retrieval.- Protection from Insider Threats.- Provenance.- Provenance and Reproducibility.- Provenance in Databases.- Provenance in Scientific Databases.- Provenance in Workflows.- Provenance Management.- Provenance Standards.- Provenance Storage.- Provenance: Privacy and Security.- Pseudonymity.- Publish/Subscribe.- Publish/Subscribe over Streams.- Punctuations.- Q-measure.- Quadtrees (and Family).- Qualitative Temporal Reasoning.- Quality and Trust of Information Content and Credentialing.- Quality of Data Warehouses.- Quantiles on Streams.- Quantitative Association Rules.- QUEL.- Query by Humming.- Query Containment.- Query Evaluation Techniques for Multidimensional Data.- Query Expansion for Information Retrieval.- Query Expansion Models.- Query Language.- Query Languages and Evaluation Techniques for Biological Sequence Data.- Query Languages for the Life Sciences.- Query Load Balancing in Parallel Database Systems.- Query Optimization.- Query Optimization (in Relational Databases).- Query Optimization in Sensor Networks.- Query Plan.- Query Point Movement Techniques for Content-Based Image Retrieval.- Query Processing.- Query Processing (in Relational Databases).- Query Processing and Optimization in Object Relational Databases.- Query Processing in data integration systems.- Query Processing in Data Warehouses.- Query Processing in Deductive Databases.- Query Processing over Uncertain Data.- Query Processor.- Query Rewriting.- Query Rewriting Using Views.- Query Translation.- Quorum Systems.- Randomization Methods to Ensure Data Privacy.- Range Query.- Rank-aware Query Processing.- Ranked XML Processing.- Ranking Functions.- Ranking Views.- Rank-Join.- Rank-Join Indices.- Raster Data Management and Multi-Dimensional Arrays.- RDF Stores.- RDF Technology.- Real and Synthetic Test Datasets.- Real-Time Transaction Processing.- Recall.- Receiver Operating Characteristic.- Recommender Systems.- Record Linkage.- Record Matching.- Redundant Arrays of Independent Disks.- Reference Knowledge.- Region Algebra.- Regulatory Compliance in Data Management.- Relational Algebra.- Relational Calculus.- Relational Model.- Relationships in Structured Text Retrieval.- Relative Time.- Relevance.- Relevance Feedback.- Relevance Feedback for Content-Based Information Retrieval.- Relevance Feedback for Text Retrieval.- Replica Control.- Replica Freshness.- Replicated Data Types.- Replicated Database Concurrency Control.- Replication.- Replication Based on Group Communication.- Replication for Availability and Fault-Tolerance.- Replication for High Availability.- Replication for Paxos.- Replication for Scalability.- Replication in Multi-Tier Architectures.- Replication with Snapshot Isolation.- Reputation and Trust.- Request Broker.- Residuated Lattice.- Resource Allocation Problems in Spatial Databases.- Resource Description Framework.- Resource Description Framework (RDF) Schema (RDFS).- Resource Identifier.- Result Display.- Retrospective Event Processing.- Reverse Nearest Neighbor Query.- Reverse Top-k Queries.- Rewriting Queries using Views.- RMI.- Road Networks.- Rocchio's Formula.- Role Based Access Control.- R-Precision.- R-Tree (and Family).- Rule-based Classification.- Safety and Domain Independence.- Sagas.- Sampling Techniques for Statistical Databases.- SAN File System.- Scalable Decision Tree Construction.- Scheduler.- Scheduling Strategies for Data Stream Processing.- Schema Evolution.- Schema Mapping.- Schema Mapping Composition.- Schema Matching.- Schema Tuning.- Schema Versioning.- Scheme/Ontology Extraction.- Scientific Databases.- Scientific Visualization.- Scientific Workflows.- Score Aggregation.- Screen Scraper.- SCSI Target.- SDC Score.- Search Engine Metrics.- Searching Digital Libraries.- Second Normal Form (2NF).- Secondary Index.- Secure Data Outsourcing.- Secure Database Development.- Secure Multiparty Computation Methods.- Secure Transaction Processing.- Security Services.- Segmentation and Stratification.- Segmentation and Stratification_old.- Selection.- Selectivity Estimation.- Self-Maintenance of Views.- Self-Management Technology in Databases.- Semantic Atomicity.- Semantic Crowd Sourcing.- Semantic Data Integration for Life Science Entities.- Semantic Data Model.- Semantic Matching.- Semantic Modeling and Knowledge Representation for Multimedia Data.- Semantic Modeling for Geographic Information Systems.- Semantic Overlay Networks.- Semantic Social Web.- Semantic Streams.- Semantic Web.- Semantic Web Query Languages.- Semantic Web Services.- Semantics-based Concurrency Control.- Semijoin.- Semijoin Program.- Semi-Structured Data.- Semi-Structured Data Model.- Semi-Structured Database Design.- Semi-Structured Query Languages.- Semi-Supervised Learning.- Sensor Networks.- Sequenced Semantics.- Sequential Patterns.- Serializability.- Serializable Snapshot Isolation.- Service Component Architecture (SCA).- Service Oriented Architecture.- Session.- Shared-Disk Architecture.- Shared-Memory Architecture.- Shared-Nothing Architecture.- Side-Effect-Free View Updates.- Signature Files.- Similarity and Ranking Operations.- Simplicial Complex.- Singular Value Decomposition.- Skyline Queries and Pareto Optimality.- Snapshot Equivalence.- Snapshot Isolation.- Snippet.- Snowflake Schema.- SOAP.- Social Applications.- Social influence.- Social Media Analysis.- Social Media Analytics.- Social Media Harvesting.- Social network analysis.- Social Networks.- Software As-A-Service (SaaS).- Software Transactional Memory.- Software-Defined Storage.- Solid State Drive (SSD).- Sort-Merge Join.- Space-Filling Curves.- Space-Filling Curves for Query Processing.- SPARQL.- Sparse Index.- Spatial and Spatio-Temporal Data Models and Languages.- Spatial and Temporal Data Warehouses .- Spatial Anonymity.- Spatial Data Analysis.- Spatial Data Mining.- Spatial Data Types.- Spatial Datawarehousing.- Spatial Indexing Techniques.- Spatial Join.- Spatial Keyword Search.- Spatial Matching Problems.- Spatial Network Databases.- Spatial Operations and Map Operations.- Spatial Queries in the Cloud.- Spatio-Temporal Data Mining.- Spatio-Temporal Data Types.- Spatio-Temporal Data Warehouses.- Spatiotemporal Interpolation Algorithms.- Spatio-Temporal Selectivity Estimation.- Spatio-Temporal Trajectories.- Specialization and Generalization.- Specificity.- Spectral Clustering.- Split.- Split Transactions.- SQL.- SQL Analytics on Big Data.- SQL Isolation Levels.- SQL-Based Temporal Query Languages.- Stable Distribution.- Stack-based Query Language.- Staged DBMS.- Standard Effectiveness Measures.- Star Index.- Star Schema.- State-based Publish/Subscribe.- Statistical Data Management.- Statistical Disclosure Limitation For Data Access.- Steganography.- Stemming.- Stop-&-go Operator.- Stoplists.- Storage Access Models.- Storage Area Network.- Storage Consolidation.- Storage Devices.- Storage Grid.- Storage Management.- Storage Management Initiative-Specification.- Storage Manager.- Storage Network Architectures.- Storage Networking Industry Association.- Storage of Large Scale Multidimensional Data.- Storage Power Management.- Storage Protection.- Storage Protocols.- Storage Resource Management.- Storage Security.- Storage Virtualization.- Stored Procedure.- Stream Mining.- Stream Models.- Stream Processing.- Stream processing on modern hardware.- Stream Reasoning.- Stream Sampling.- Stream Similarity Mining.- Streaming Analytics.- Streaming Applications.- Stream-Oriented Query Languages and Operators.- Strong Consistency Models for Replicated Data.- Structural Indexing.- Structure Analytics in Social Media.- Structure Weight.- Structured Data in Peer-to-Peer Systems.- Structured Document Retrieval.- Structured Text Retrieval Models.- Subject Spaces.- Subspace Clustering Techniques.- Success at n.- Succinct Constraints.- Suffix Tree.- Summarizability.- Summarization.- Support Vector Machine.- Supporting Transaction Time Databases.- Symbolic Representation.- Symmetric Encryption.- Synopsis Structure.- Synthetic Microdata.- System R (R*) Optimizer.- Table.- Tabular Data.- Taxonomy: Biomedical Health Informatics.- tBench.- Telic Distinction in Temporal Databases.- Telos.- Temporal Access Control.- Temporal Aggregation.- Temporal Algebras.- Temporal Analytics in Social Media.- Temporal Benchmarks.- Temporal Coalescing.- Temporal Compatibility.- Temporal Conceptual Models.- Temporal Constraints.- Temporal Data Mining.- Temporal Data Models.- Temporal Database.- Temporal Datawarehousing.- Temporal Dependencies.- Temporal Element.- Temporal Expression.- Temporal Generalization.- Temporal Granularity.- Temporal Homogeneity.- Temporal Indeterminacy.- Temporal Integrity Constraints.- Temporal Joins.- Temporal Logic in Database Query Languages.- Temporal Logical Models.- Temporal Object-Oriented Databases.- Temporal Periodicity.- Temporal Projection.- Temporal PSM.- Temporal Query Languages.- Temporal Query Processing.- Temporal Relational Calculus.- Temporal Specialization.- Temporal Strata.- Temporal Support in the SQL Standard.- Temporal Vacuuming.- Temporal Visual Languages.- Temporal XML.- Term Proximity.- Term Statistics for Structured Text Retrieval.- Term Weighting.- Test Collection.- Text Analytics.- Text Analytics in Social Media.- Text Categorization.- Text Clustering.- Text Compression.- Text Generation.- Text Index Compression.- Text Indexing and Retrieval.- Text Indexing Techniques.- Text Mining.- Text Mining of Biological Resources.- Text Representation.- Text Segmentation.- Text Semantic Representation.- Text Stream Processing.- Text Streaming Model.- Text Summarization.- Text Visualization.- TF*IDF.- Thematic Map.- Third Normal Form.- Three-Dimensional GIS and Geological Applications.- Three-Phase Commit.- Tight Coupling.- Time Aggregated Graphs.- Time and Information Retrieval.- Time Domain.- Time in Philosophical Logic.- Time Instant.- Time Interval.- Time Period.- Time Series Query.- Time Span.- Time-Line Clock.- Timeslice Operator.- Topic Detection and Tracking.- Topic Maps.- Topic-based Publish/Subscribe.- Top-k Queries.- Top-K Selection Queries on Multimedia Datasets.- Topological Data Models.- Topological Relationships.- Trajectory.- Transaction.- Transaction Chopping.- Transaction Management.- Transaction Manager.- Transaction Models - the Read/Write Approach.- Transaction Time.- Transactional Middleware.- Transactional Processes.- Transactional Stream Processing.- Transaction-Time Indexing.- Tree-based Indexing.- Treemaps.- Triangular Norms.- Triangulated Irregular Network.- Trie.- Trip Planning Queries.- Trust and Reputation in Peer-to-Peer Systems.- Trust in Blogosphere.- Trusted Hardware.- TSQL2.- Tuning Concurrency Control.- Tuple-Generating Dependencies.- Two-Dimensional Shape Retrieval.- Two-Phase Commit.- Two-Phase Commit Protocol.- Two-Phase Locking.- Two-Poisson model.- Type-based Publish/Subscribe.- U-measure.- Uncertain Data Lineage.- Uncertain Data Mining.- Uncertain Data Models.- Uncertain Data Streams.- Uncertain Data Summarization.- Uncertain Graph Data Management.- Uncertain Spatial Data Management.- Uncertain Top-k Queries.- Uncertainty in Events.- Uncertainty Management in Scientific Database Systems.- Unicode.- Unified Modeling Language.- Union.- Unobservability.- Updates and Transactions in Peer-to-Peer Systems.- Updates through Views.- Usability.- User-Defined Time.- Valid Time.- Valid-Time Indexing.- Value Equivalence.- Variable Time Span.- Vector-Space Model.- Vertically Partitioned Data.- Video.- Video Content Analysis.- Video Content Modeling.- Video Content Structure.- Video Metadata.- Video Querying.- Video Representation.- Video Scene and Event Detection.- Video Segmentation.- Video Sequence Indexing.- Video Shot Detection.- Video Summarization.- View Adaptation.- View Definition.- View Maintenance.- View Maintenance Aspects.- View-based Data Integration.- Views.- Virtual Partitioning.- Visual Analytics.- Visual Association Rules.- Visual Classification.- Visual Clustering.- Visual Content Analysis.- Visual Data Mining.- Visual Formalisms.- Visual Interaction.- Visual Interfaces.- Visual Interfaces for Geographic Data.- Visual interfaces for streaming data.- Visual Metaphor.- Visual On-Line Analytical Processing (OLAP).- Visual Perception.- Visual Query Language.- Visual Representation.- Visualization for Information Retrieval.- Visualization Pipeline.- Visualizing Categorical Data.- Visualizing Clustering Results.- Visualizing Hierarchical Data.- Visualizing Network Data.- Visualizing Quantitative Data.- Volume.- Voronoi Diagrams.- W3C.- WAN Data Replication.- Wavelets on Streams.- Weak Consistency Models for Replicated Data.- Weak Equivalence.- Web 2.0/3.0.- Web Advertising.- Web Characteristics and Evolution.- Web Crawler Architecture.- Web Data Extraction System.- Web ETL.- Web Harvesting.- Web Information Extraction.- WEB Information Retrieval Models.- Web Mashups.- Web Page Quality Metrics.- Web Question Answering.- Web Search Query Rewriting.- Web Search Relevance Feedback.- Web Search Relevance Ranking.- Web Search Result Caching and Prefetching.- Web Search Result De-duplication and Clustering.- Web Services.- Web Services and the Semantic Web for Life Science Data.- Web Spam Detection.- Web Transactions.- Web Views.- What-If Analysis.- WIMP Interfaces.- Window operator in RDBMS.- Window-based Query Processing.- Windows.- Workflow Constructs.- Workflow Evolution.- Workflow Join.- Workflow Management.- Workflow Management and Workflow Management System.- Workflow Management Coalition.- Workflow Model.- Workflow Model Analysis.- Workflow Patterns.- Workflow Schema.- Workflow Transactions.- Wrapper Induction.- Wrapper Maintenance.- Wrapper Stability.- Write Once Read Many.- XML.- XML Access Control.- XML Attribute.- XML Benchmarks.- XML Compression.- XML Document.- XML Element.- XML Indexing.- XML Information Integration.- XML Integrity Constraints.- XML Metadata Interchange.- XML Metadata Interchange Specification (XMI).- XML Parsing, SAX/DOM.- XML Process Definition Language.- XML Programming.- XML Publish/Subscribe.- XML Publishing.- XML Retrieval.- XML Schema.- XML Selectivity Estimation.- XML Storage.- XML Stream Processing.- XML Tree Pattern, XML Twig Query.- XML Tuple Algebra.- XML Typechecking.- XML Types.- XML Updates.- XML Views.- XPath/XQuery.- XQuery Full-Text.- XQuery Processors.- XSL/XSLT.- Zero-One Laws.- Zooming Techniques.- α-nDCG.-

    1 in stock

    £4,324.60

  • A Users Guide to Business Analytics

    Taylor & Francis Inc A Users Guide to Business Analytics

    1 in stock

    Book SynopsisA User''s Guide to Business Analytics provides a comprehensive discussion of statistical methods useful to the business analyst. Methods are developed from a fairly basic level to accommodate readers who have limited training in the theory of statistics. A substantial number of case studies and numerical illustrations using the R-software package are provided for the benefit of motivated beginners who want to get a head start in analytics as well as for experts on the job who will benefit by using this text as a reference book.The book is comprised of 12 chapters. The first chapter focuses on business analytics, along with its emergence and application, and sets up a context for the whole book. The next three chapters introduce R and provide a comprehensive discussion on descriptive analytics, including numerical data summarization and visual analytics. Chapters five through seven discuss set theory, definitions and counting rules, probability, random Table of ContentsWhat Is Analytics? Introducing R—An Analytics Software. Reporting Data. Statistical Graphics and Visual Analytics. Probability. Random Variables and Probability Distributions. Continuous Random Variables. Statistical Inference. Regression for Predictive Model Building. Decision Trees. Data Mining and Multivariate Methods. Modeling Time Series Data for Forecasting.

    1 in stock

    £128.25

  • IoT Solutions in Microsofts Azure IoT Suite

    APress IoT Solutions in Microsofts Azure IoT Suite

    1 in stock

    Book SynopsisCollect and analyze sensor and usage data from Internet of Things applications with Microsoft Azure IoT Suite. Internet connectivity to everyday devices such as light bulbs, thermostats, and even voice-command devices such as Google Home and Amazon.com''s Alexa is exploding. These connected devices and their respective applications generate large amounts of data that can be mined to enhance user-friendliness and make predictions about what a user might be likely to do next. Microsoft''s Azure IoT Suite is a cloud-based platform that is ideal for collecting data from connected devices. You''ll learn in this book about data acquisition and analysis, including real-time analysis. Real-world examples are provided to teach you to detect anomalous patterns in your data that might lead to business advantage. We live in a time when the amount of data being generated and stored is growing at an exponential rate. Understanding and getting real-time insight into these datTable of ContentsIntroductionPart I: Getting Started1. The World of Big Data and IoT2. Generating Data with DevicesPart II: Data on the Move3. Azure IoT Hub4. Ingesting Data with Azure IoT Hub5. Azure Stream Analytics6. Real-Time Data Streaming7. Azure Data Factory8. Integrating Data Between Data Stores Using Azure Data FactoryPart III: Data at Rest9. Azure Data Lake Store10. Azure Data Lake Analytics11. U-SQL12. Azure HDInsight13. Real-time Insights and Reporting on Big Data14. Azure Machine LearningPart IV: More on Cortana Intelligence15. Azure Data Catalog16. Azure Event Hubs

    1 in stock

    £58.49

  • Beginning Apache Spark 2

    APress Beginning Apache Spark 2

    2 in stock

    Book SynopsisDevelop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it.Along the way, you'll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you'll learn the fundamentals of Spark ML for machine learning and much more.  After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications.  What You Will Learn   Understand Spark unified data processing platform How

    2 in stock

    £29.69

  • Data versus Democracy

    APress Data versus Democracy

    1 in stock

    Book Synopsis Human attention is in the highest demand it has ever been. The drastic increase in available information has compelled individuals to find a way to sift through the media that is literally at their fingertips. Content recommendation systems have emerged as the technological solution to this social and informational problem, but they''ve also created a bigger crisis in confirming our biases by showing us only, and exactly, what it predicts we want to see.   Data versus Democracy investigates and explores how, in the era of social media, human cognition, algorithmic recommendation systems, and human psychology are all working together to reinforce (and exaggerate) human bias. The dangerous confluence of these factors is driving media narratives, influencing opinions, and possibly changing election results.  In this book, algorithmic recommendations, clickbait, familiarity bias, propaganda, Trade Review“A very well written book that has an engaging style of writing, doesn’t become dry or bogged down in the details, but still showcases the depth of knowledge that Shaffer has on the subject. … It’s accessible and it provides a satisfying read to those looking for deep analysis of this emerging problem faced by the world.” (The Robotics Law Journal, Vol. 5 (2), September - October, 2019) Table of ContentsPart I: The Propaganda Problem.- Chapter 1: Pay Attention: How Information Abundance Affects the Way We Consume Media .- Chapter 2: Cog in the System: How the Limits of Our Brains Leave Us Vulnerable to Cognitive Hacking.- Chapter 3: Swimming Upstream: How Content Recommendation Engines Impact Information and Manipulate Our Attention.- Part II: Case Studies.- Chapter 4: Domestic Disturbance: Ferguson, GamerGate, and the Rise of the American Alt-Right.- Chapter 5: Democracy Hacked, Part 1: Russian Interference and the New Cold War .- Chapter 6: Democracy Hacked, Part 2: Rumors, Bots, and Genocide in the Global South .- Chapter 7: Conclusion: Where Do We Go from Here?.-

    1 in stock

    £26.99

  • Mastering Snowflake Solutions

    APress Mastering Snowflake Solutions

    1 in stock

    Book SynopsisDesign for large-scale, high-performance queries using Snowflake's query processing engine to empower data consumers with timely, comprehensive, and secure access to data. This book also helps you protect your most valuable data assets using built-in security features such as end-to-end encryption for data at rest and in transit. It demonstrates key features in Snowflake and shows how to exploit those features to deliver a personalized experience to your customers. It also shows how to ingest the high volumes of both structured and unstructured data that are needed for game-changing business intelligence analysis.Mastering Snowflake Solutionsstarts with a refresher on Snowflake's unique architecture before getting into the advanced concepts that make Snowflake the market-leading product it is today. Progressing through each chapter, you will learn how to leverage storage, query processing, cloning, data sharing, and continuous data protection features. This approach allows for greater Table of Contents1. Snowflake Architecture2. Data Movement3. Cloning4. Managing Security and User Access Control 5. Protecting Data in Snowflake6. Business Continuity and Disaster Recovery7. Data Sharing and the Data Cloud8. Programming9. Advanced Performance Tuning10. Developing Applications in Snowflake

    1 in stock

    £46.74

  • Building the Snowflake Data Cloud

    APress Building the Snowflake Data Cloud

    1 in stock

    Book SynopsisImplement the Snowflake Data Cloud using best practices and reap the benefits of scalability and low-cost from the industry-leading, cloud-based, data warehousing platform. This book provides a detailed how-to explanation, and assumes familiarity with Snowflake core concepts and principles. It is a project-oriented book with a hands-on approach to designing, developing, and implementing your Data Cloud with security at the center. As you work through the examples, you will develop the skill, knowledge, and expertise to expand your capability by incorporating additional Snowflake features, tools, and techniques. Your Snowflake Data Cloud will be fit for purpose, extensible, and at the forefront of both Direct Share, Data Exchange, and Snowflake Marketplace. Building the Snowflake Data Cloud helps you transform your organization into monetizing the value locked up within your data. As the digital economy takes hold, with data volume, velociTable of ContentsPart I. Context 1. The Snowflake Data Cloud 2. Breaking Data Siloes Part II. Concepts 3. Architecture 4. Account Security5. Role Based Access Control (RBAC)6. Account Usage StorePart III. Tools7. Ingesting Data8. Data Pipelines9. Data Presentation10. Semi Structured and Unstructured DataPart IV. Management11. Query Optimizer Basics12. Data Management13. Data Modelling14. Snowflake Data Cloud By Example

    1 in stock

    £46.74

  • Leveling Up with SQL

    APress Leveling Up with SQL

    1 in stock

    Book SynopsisIntermediate-Advanced user levelTable of ContentsChapter 1: Getting Ready.- Chapter 2: Working with Table Design.- Chapter 3: Table Relationships and Working With Joins.- Chapter 4: Working with Calculated Data.- Chapter 5: Aggregating Data.- Chapter 6: Creating and Using Views and Friends.- Chapter 7: Working With Subqueries and Common Table Expressions.- Chapter 8: Working With Window Functions.-Chapter 9: More on Common Table Expressions.- Chapter 10: More Techniques with SQL: Triggers, Pivot Tables, and Variables.- Appendix A.

    1 in stock

    £35.99

  • Big Data for Chimps

    O'Reilly Media Big Data for Chimps

    3 in stock

    Book SynopsisFinding patterns in massive event streams can be difficult, but learning how to find them doesn't have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop.

    3 in stock

    £23.99

  • Learning to Love Data Science

    O'Reilly Media Learning to Love Data Science

    1 in stock

    Book SynopsisToday, big data is taken seriously, and data science is considered downright sexy. With this anthology of reports from award-winning journalist Mike Barlow, you'll appreciate how data science is fundamentally altering our world, for better and for worse.

    1 in stock

    £15.99

  • MongoDB The Definitive Guide 3e

    O'Reilly Media MongoDB The Definitive Guide 3e

    4 in stock

    Book SynopsisManage your data with a system designed to support modern application development. Updated for MongoDB 4.2, the third edition of this authoritative and accessible guide shows you the advantages of using document-oriented databases.

    4 in stock

    £39.74

  • Agile Data Science 2.0

    O'Reilly Media Agile Data Science 2.0

    3 in stock

    Book SynopsisWith the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.

    3 in stock

    £35.99

© 2025 Book Curl

    • American Express
    • Apple Pay
    • Diners Club
    • Discover
    • Google Pay
    • Maestro
    • Mastercard
    • PayPal
    • Shop Pay
    • Union Pay
    • Visa

    Login

    Forgot your password?

    Don't have an account yet?
    Create account