Data mining Books
University of California Press Data Mining for the Social Sciences
Book SynopsisWe live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Providing an introduction to data mining, the authors discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists.Table of ContentsPART 1. CONCEPTS 1. What Is Data Mining? 2. Contrasts with the Conventional Statistical Approach 3. Some General Strategies Used in Data Mining 4. Important Stages in a Data Mining Project PART 2. WORKED EXAMPLES 5. Preparing Training and Test Datasets 6. Variable Selection Tools 7. Creating New Variables Using Binning and Trees 8. Extracting Variables 9. Classifiers 10. Classification Trees 11. Neural Networks 12. Clustering 13. Latent Class Analysis and Mixture Models 14. Association Rules Conclusion Bibliography Notes Index
£28.90
Princeton University Press The Silicon Jungle
Book SynopsisWhat happens when a naive intern is granted unfettered access to people's most private thoughts and actions? Stephen Thorpe lands a coveted internship at Ubatoo, an Internet empire that provides its users with popular online services, from a search engine and e-mail, to social networking. When Stephen's boss asks him to work on a project with the ATrade ReviewCo-Winner of the 2012 Mary Shelley Award for Outstanding Fictional Work, Media Ecology Association "Baluja's clever, cynical debut explores the frightening possibilities of data mining... A nod to Upton Sinclair's muckraking The Jungle, which scared its readers into regulating the meat-packing industry, this lively if depressing novel suggests that computer snooping is too seductive to control, despite the consequences."--Publishers Weekly "[F]righteningly convincing... The read is quick, the questions will linger, and the ideas are so intriguing... Baluja simplifies the abstract world of tech-speak for the rest of us while aiming to do for the Internet what Upton Sinclair's The Jungle did for the meat industry: make readers reconsider its safety. For fans of intelligent thrillers."--Stephen Morrow, Library Journal "In the era of the ubiquitous web company, The Silicon Jungle provides ample food for thought."--Zena Iovino, New Scientist "[T]his cautionary tale is fascinating for its exploration of technology as a conduit for crime."--Michele Leber, Booklist Online "The book's central message is fascinating. A company like Google, Baluja points out, has far more information on U.S. citizens than does the FBI and far fewer restrictions on how to use it. It's a chilling message in a fun package."--Kathleen Offenholley, Mathematics TeacherTable of ContentsPreface xi Endings 1 Anklets 3 Anthropologists in the Midst 10 Mollycoddle 13 Touchpoints 19 Checking In 26 Working 9 to 4 28 Predicting the Future and 38 Needles 33 Contact 39 Two Geeks in a Pod 47 An Understatement 53 Euphoria and Diet Pills 61 To Better Days 70 Marathon 75 The Life and Soul of an Intern 81 Candid Cameras 85 Episodes 89 Liberal Food and Even More Liberal Activism 92 Subjects 100 Newsworthy 105 Patience 110 Hypergrowth 113 Little Pink Houses 117 Truth, Lies, and Algorithms 122 Negotiations and Herding Cats 129 The JENNY Discovery 133 I Dream of JENNY 138 A Five-Step Program: Hallucinations and Archetypes 143 Over-Deliver 150 A Life Changed in Four Phone Calls 154 Giving Thanks 160 A Drive through the Country 166 Control 171 A Tale of Two Tenures 178 Prelude to Pie 183 The Yuri Effect 188 Apple Pie 195 Thoughts Like Butterflies 201 Core-Relations 207 Collide 212 Control, Revisited 220 Fables of the Deconstruction 223 Control, Foregone 232 Foundations 236 One Way 241 Sebastin's Friends 244 A Tinker by Any Other Name 251 When It Rains 262 I Am a Heartbeat 267 What I Did This Summer 273 A Permanent Position 280 For Adam 284 Faith 288 Counting by Two 291 Disconnect 298 Sahim 304 Epilogue: Beginnings 309 Acknowledgments 313 Know More 315 Privacy Policy of a Few Organizations 317 References 319
£25.20
Springer-Verlag New York Inc. Data Mining Techniques for the Life Sciences
Book SynopsisThis third edition details new and updated methods and protocols on important databases and data mining tools. Chapters guides readers through archives of macromolecular sequences and three-dimensional structures, databases of protein-protein interactions, methods for prediction conformational disorder, mutant thermodynamic stability, aggregation, and drug response. Quality of structural data and their release, soft mechanics applications in biology, and protein flexibility are considered, too, together with pan-genome analyses, rational drug combination screening and Omics Deep Mining. Written in the format of the highly successful Methods in Molecular Biology series, each chapter includes an introduction to the topic, lists necessary materials, includes step-by-step, readily reproducible protocols. Authoritative and cutting-edge, Data Mining Techniques for the Life Sciences, Third Edition aims to be a practical guide to researches to help furthertheir study in this field.Table of ContentsPart I: DATABASES 1 EBI data resources Rolf Apweiler and Amonida Zadissa 2 IMEx databases: displaying molecular interactions into a single, standards-compliant dataset Pablo Porras, Sandra Orchard and Luana Licata 3 Protein Three-dimensional Structure Databases Vaishali P. Waman, Christine Orengo, Gerard J. Kleywegt and Arthur M. Lesk Part II: PREDICTION METHODS 4 Predicting protein conformational disorder and disordered binding sites Ketty Tamburrini, Giulia Pesce, Juliet Nilsson, Frank Gondelaud, Andrey V. Kajava, Jean-Guy Berrin and Sonia Longhi 5 Profiles of natural and designed protein-like sequences effectively bridge protein sequence gaps: Implications in distant homology detection Gayatri Kumar, Narayanaswamy Srinivasa and Sankaran Sandhya 6 Turning failures into applications: the problem of protein ΔΔG prediction Rita Casadio, Castrense Savojardo, Piero Fariselli, Emidio Capriotti and Pier Luigi Martelli 7 Dissecting the genome for drug response prediction Gerardo Pepe, Chiara Carrino, Luca Parca, Manuela Helmer-Citterich 8 Prediction of the effect of pH on the aggregation and conditional folding of intrinsically disordered proteins with SolupHred and DispHred Valentín Iglesias, Carlos Pintado-Grima, Jaime Santos, Marc Fornt and Salvador Ventura 9 Extracting the dynamic motion of proteins using Normal Mode Analysis Jacob A. Bauer and Vladena Bauerová Part III: DATA QUALITY 10 Pre- and Post- Publication Verification for Reproducible Data Mining in Macromolecular Crystallography John R Helliwell 11 Soft Statistical Mechanics for Biology Mariano Bizzarri, Alessandro Giuliani 12 Uses and abuses of the atomic displacement parameters in structural biology Oliviero Carugo 13 Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes Erwin Tantoso, Birgit Eisenhaber and Frank Eisenhaber Part VI: BIG DATA 14 Computational pipeline for rational drug combination screening in patient-derived cells Paschalis Athanasiadis, Aleksandr Ianevski, Sigrid Skånland and Tero Aittokallio 15 Deep Mining from Omics Data Abeer Alzubaidi and Jonathan Tepper
£151.99
John Wiley & Sons Inc Data Mining Algorithms
Book SynopsisData Mining Algorithms is a practical, technically-oriented guide to data mining algorithms that covers the most important algorithms for building classification, regression, and clustering models, as well as techniques used for attribute selection and transformation, model quality evaluation, and creating model ensembles. The author presents many of the important topics and methodologies widely used in data mining, whilst demonstrating the internal operation and usage of data mining algorithms using examples in R.Table of ContentsAcknowledgements xix Preface xxi References xxxi Part I Preliminaries 1 1 Tasks 3 1.1 Introduction 3 1.2 Inductive learning tasks 5 1.3 Classification 9 1.4 Regression 14 1.5 Clustering 16 1.6 Practical issues 19 1.7 Conclusion 20 1.8 Further readings 21 References 22 2 Basic statistics 23 2.1 Introduction 23 2.2 Notational conventions 24 2.3 Basic statistics as modeling 24 2.4 Distribution description 25 2.5 Relationship detection 47 2.6 Visualization 62 2.7 Conclusion 65 2.8 Further readings 66 References 67 Part II Classification 69 3 Decision trees 71 3.1 Introduction 71 3.2 Decision tree model 72 3.3 Growing 76 3.4 Pruning 90 3.5 Prediction 103 3.6 Weighted instances 105 3.7 Missing value handling 106 3.8 Conclusion 114 3.9 Further readings 114 References 116 4 Naïve Bayes classifier 118 4.1 Introduction 118 4.2 Bayes rule 118 4.3 Classification by Bayesian inference 120 4.4 Practical issues 125 4.5 Conclusion 131 4.6 Further readings 131 References 132 5 Linear classification 134 5.1 Introduction 134 5.2 Linear representation 136 5.3 Parameter estimation 145 5.4 Discrete attributes 154 5.5 Conclusion 155 5.6 Further readings 156 References 157 6 Misclassification costs 159 6.1 Introduction 159 6.2 Cost representation 161 6.3 Incorporating misclassification costs 164 6.4 Effects of cost incorporation 176 6.5 Experimental procedure 180 6.6 Conclusion 184 6.7 Further readings 185 References 187 7 Classification model evaluation 189 7.1 Introduction 189 7.2 Performance measures 190 7.3 Evaluation procedures 213 7.4 Conclusion 231 7.5 Further readings 232 References 233 Part III Regression 235 8 Linear regression 237 8.1 Introduction 237 8.2 Linear representation 238 8.3 Parameter estimation 242 8.4 Discrete attributes 250 8.5 Advantages of linear models 251 8.6 Beyond linearity 252 8.7 Conclusion 258 8.8 Further readings 258 References 259 9 Regression trees 261 9.1 Introduction 261 9.2 Regression tree model 262 9.3 Growing 263 9.4 Pruning 274 9.5 Prediction 277 9.6 Weighted instances 278 9.7 Missing value handling 279 9.8 Piecewise linear regression 284 9.9 Conclusion 292 9.10 Further readings 292 References 293 10 Regression model evaluation 295 10.1 Introduction 295 10.2 Performance measures 296 10.3 Evaluation procedures 303 10.4 Conclusion 309 10.5 Further readings 309 References 310 Part IV Clustering 311 11 (Dis)similarity measures 313 11.1 Introduction 313 11.2 Measuring dissimilarity and similarity 313 11.3 Difference-based dissimilarity 314 11.4 Correlation-based similarity 321 11.5 Missing attribute values 324 11.6 Conclusion 325 11.7 Further readings 325 References 326 12 k-Centers clustering 328 12.1 Introduction 328 12.2 Algorithm scheme 330 12.3 k-Means 334 12.4 Beyond means 338 12.5 Beyond (fixed) k 342 12.6 Explicit cluster modeling 343 12.7 Conclusion 345 12.8 Further readings 345 References 347 13 Hierarchical clustering 349 13.1 Introduction 349 13.2 Cluster hierarchies 351 13.3 Agglomerative clustering 353 13.4 Divisive clustering 361 13.5 Hierarchical clustering visualization 364 13.6 Hierarchical clustering prediction 366 13.7 Conclusion 369 13.8 Further readings 370 References 371 14 Clustering model evaluation 373 14.1 Introduction 373 14.2 Per-cluster quality measures 376 14.3 Overall quality measures 385 14.4 External quality measures 393 14.5 Using quality measures 397 14.6 Conclusion 398 14.7 Further readings 398 References 399 Part V Getting Better Models 401 15 Model ensembles 403 15.1 Introduction 403 15.2 Model committees 404 15.3 Base models 406 15.4 Model aggregation 420 15.5 Specific ensemble modeling algorithms 431 15.6 Quality of ensemble predictions 448 15.7 Conclusion 449 15.8 Further readings 450 References 451 16 Kernel methods 454 16.1 Introduction 454 16.2 Support vector machines 457 16.3 Support vector regression 473 16.4 Kernel trick 482 16.5 Kernel functions 484 16.6 Kernel prediction 487 16.7 Kernel-based algorithms 489 16.8 Conclusion 494 16.9 Further readings 495 References 496 17 Attribute transformation 498 17.1 Introduction 498 17.2 Attribute transformation task 499 17.3 Simple transformations 504 17.4 Multiclass encoding 510 17.5 Conclusion 521 17.6 Further readings 521 References 522 18 Discretization 524 18.1 Introduction 524 18.2 Discretization task 525 18.3 Unsupervised discretization 530 18.4 Supervised discretization 533 18.5 Effects of discretization 551 18.6 Conclusion 553 18.7 Further readings 553 References 556 19 Attribute selection 558 19.1 Introduction 558 19.2 Attribute selection task 559 19.3 Attribute subset search 562 19.4 Attribute selection filters 568 19.5 Attribute selection wrappers 588 19.6 Effects of attribute selection 593 19.7 Conclusion 598 19.8 Further readings 599 References 600 20 Case studies 602 20.1 Introduction 602 20.2 Census income 605 20.3 Communities and crime 631 20.4 Cover type 640 20.5 Conclusion 654 20.6 Further readings 655 References 655 Closing 657 A Notation 659 A.1 Attribute values 659 A.2 Data subsets 659 A.3 Probabilities 660 B R packages 661 B.1 CRAN packages 661 B.2 DMR packages 662 B.3 Installing packages 663 References 664 C Datasets 666 Index 667
£59.80
John Wiley & Sons Inc Data Mining and Business Analytics with R
Book SynopsisCollecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets.Trade Review"I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (Journal of the American Statistical Association, 1 January 2014)Table of ContentsPreface ix Acknowledgments xi 1. Introduction 1 Reference 6 2. Processing the Information and Getting to Know Your Data 7 2.1 Example 1: 2006 Birth Data 7 2.2 Example 2: Alumni Donations 17 2.3 Example 3: Orange Juice 31 References 39 3. Standard Linear Regression 40 3.1 Estimation in R 43 3.2 Example 1: Fuel Efficiency of Automobiles 43 3.3 Example 2: Toyota Used-Car Prices 47 Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53 References 54 4. Local Polynomial Regression: a Nonparametric Regression Approach 55 4.1 Model Selection 56 4.2 Application to Density Estimation and the Smoothing of Histograms 58 4.3 Extension to the Multiple Regression Model 58 4.4 Examples and Software 58 References 65 5. Importance of Parsimony in Statistical Modeling 67 5.1 How Do We Guard Against False Discovery 67 References 70 6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71 6.1 Example 1: Prostate Cancer 74 6.2 Example 2: Orange Juice 78 References 82 7. Logistic Regression 83 7.1 Building a Linear Model for Binary Response Data 83 7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85 7.3 Statistical Inference 85 7.4 Classification of New Cases 86 7.5 Estimation in R 87 7.6 Example 1: Death Penalty Data 87 7.7 Example 2: Delayed Airplanes 92 7.8 Example 3: Loan Acceptance 100 7.9 Example 4: German Credit Data 103 References 107 8. Binary Classification, Probabilities, and Evaluating Classification Performance 108 8.1 Binary Classification 108 8.2 Using Probabilities to Make Decisions 108 8.3 Sensitivity and Specificity 109 8.4 Example: German Credit Data 109 9. Classification Using a Nearest Neighbor Analysis 115 9.1 The k-Nearest Neighbor Algorithm 116 9.2 Example 1: Forensic Glass 117 9.3 Example 2: German Credit Data 122 Reference 125 10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical Predictor Variables 126 10.1 Example: Delayed Airplanes 127 Reference 131 11. Multinomial Logistic Regression 132 11.1 Computer Software 134 11.2 Example 1: Forensic Glass 134 11.3 Example 2: Forensic Glass Revisited 141 Appendix 11.A Specification of a Simple Triplet Matrix 147 References 149 12. More on Classification and a Discussion on Discriminant Analysis 150 12.1 Fisher’s Linear Discriminant Function 153 12.2 Example 1: German Credit Data 154 12.3 Example 2: Fisher Iris Data 156 12.4 Example 3: Forensic Glass Data 157 12.5 Example 4: MBA Admission Data 159 Reference 160 13. Decision Trees 161 13.1 Example 1: Prostate Cancer 167 13.2 Example 2: Motorcycle Acceleration 179 13.3 Example 3: Fisher Iris Data Revisited 182 14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185 14.1 R Packages for Tree Construction 185 14.2 Chi-Square Automatic Interaction Detection (CHAID) 186 14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188 14.4 Support Vector Machines (SVM) 192 14.5 Neural Networks 192 14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193 References 195 15. Clustering 196 15.1 k-Means Clustering 196 15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204 15.3 Hierarchical Clustering Procedures 212 References 219 16. Market Basket Analysis: Association Rules and Lift 220 16.1 Example 1: Online Radio 222 16.2 Example 2: Predicting Income 227 References 234 17. Dimension Reduction: Factor Models and Principal Components 235 17.1 Example 1: European Protein Consumption 238 17.2 Example 2: Monthly US Unemployment Rates 243 18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247 18.1 Three Examples 249 References 257 19. Text as Data: Text Mining and Sentiment Analysis 258 19.1 Inverse Multinomial Logistic Regression 259 19.2 Example 1: Restaurant Reviews 261 19.3 Example 2: Political Sentiment 266 Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268 References 271 20. Network Data 272 20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274 20.2 Example 2: Connections in a Friendship Network 278 References 292 Appendix A: Exercises 293 Exercise 1 294 Exercise 2 294 Exercise 3 296 Exercise 4 298 Exercise 5 299 Exercise 6 300 Exercise 7 301 Appendix B: References 338 Index 341
£98.06
John Wiley & Sons Inc Effective CRM using Predictive Analytics
Book SynopsisA step-by-step guide to data mining applications in CRM. Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques. The book is organized into three parts.Table of ContentsPreface xiii Acknowledgments xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the data 1 1.1 The applications 1 1.2 The methodology 4 1.3 The algorithms 6 1.3.1 Supervised models 6 1.3.1.1 Classification models 7 1.3.1.2 Estimation (regression) models 9 1.3.1.3 Feature selection (field screening) 10 1.3.2 Unsupervised models 10 1.3.2.1 Cluster models 11 1.3.2.2 Association (affinity) and sequence models 12 1.3.2.3 Dimensionality reduction models 14 1.3.2.4 Record screening models 14 1.4 The data 15 1.4.1 The mining datamart 16 1.4.2 The required data per industry 16 1.4.3 The customer “signature”: from the mining datamart to the enriched, marketing reference table 16 1.5 Summary 20 Part I The Methodology 21 2 Classification modeling methodology 23 2.1 An overview of the methodology for classification modeling 23 2.2 Business understanding and design of the process 24 2.2.1 Definition of the business objective 24 2.2.2 Definition of the mining approach and of the data model 26 2.2.3 Design of the modeling process 27 2.2.3.1 Defining the modeling population 27 2.2.3.2 Determining the modeling (analysis) level 28 2.2.3.3 Definition of the target event and population 28 2.2.3.4 Deciding on time frames 29 2.3 Data understanding, preparation, and enrichment 33 2.3.1 Investigation of data sources 34 2.3.2 Selecting the data sources to be used 34 2.3.3 Data integration and aggregation 35 2.3.4 Data exploration, validation, and cleaning 35 2.3.5 Data transformations and enrichment 38 2.3.6 Applying a validation technique 40 2.3.6.1 Split or Holdout validation 40 2.3.6.2 Cross or n‐fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test–control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high‐value customers 96 2.9 Cross‐selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep‐selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up‐selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of “premium” product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we’ve learned so far: it’s not about the tool or the modeling algorithm. It’s about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 “Technical” evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine‐tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi‐square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K‐means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card‐level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field’s distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross‐selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross‐selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross‐selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross‐sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross‐selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross‐selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross‐selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K‐means cluster 354 8.6.1 Clustering with the K‐means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362
£43.65
John Wiley & Sons Inc Statistical Data Analytics
Book SynopsisSolutions Manual to accompany Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery A comprehensive introduction to statistical methods for data mining and knowledge discovery.Extensivesolutions using actual data (with sample R programming code) are provided, illustrating diverse informatic sources in genomics, biomedicine, ecological remote sensing, astronomy, socioeconomics, marketing, advertising and finance, among many others.Table of ContentsPreface vii 1 Data analytics and data mining 1 2 Basic probability and statistical distributions 3 3 Data manipulation 14 4 Data visualization and statistical graphics 28 5 Statistical inference 45 6 Techniques for supervised learning: simple linear regression 65 7 Techniques for supervised learning: multiple linear regression 90 8 Supervised learning: generalized linear models 134 9 Supervised learning: classification 154 10 Techniques for unsupervised learning: dimension reduction 185 11 Techniques for unsupervised learning: clustering and association 200 References 216
£16.95
John Wiley & Sons Inc Text Mining in Practice with R
Book SynopsisA reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.Table of ContentsForeword 1 Chapter 1: What is Text Mining? 1 1.1 What is it? 1 1.1.1 What is text mining in practice? 1 1.1.2 Where does text mining fit? 1 1.2 Why we care about text mining? 1 1.2.1 What are the consequences of ignoring text? 1 1.2.2 What are the benefits of text mining? 1 1.2.3 Setting Expectations: When text mining should (and should not) be used. 1 1.3 A basic workflow. How the process works. 1 1.4 What tools do I need to get started with this? 1 1.5 A Simple Example 1 1.6 A Real World Use Case 1 1.7 Summary 1 Chapter 2: Basics of text mining 1 2.1 What is Text Mining in a practical sense? 1 2.2 Types of Text Mining: Bag of Words. 1 2.2.1 Types of Text Mining: Syntactic Parsing. 1 2.3 The text mining process in context 1 2.4 String Manipulation: Number of Characters & Substitutions 1 2.4.1 String Manipulations: Paste, Character Splits & Extractions 1 2.5 Keyword Scanning 1 2.6 String Packages stringr & stringi 1 2.7 Preprocessing Steps for Bag of Words Text Mining 1 2.8 Spell Check 1 2.9 Frequent Terms & Associations 1 2.9 Delta Assist Wrap Up 1 2.10 Summary 1 Chapter 3: Common Text Mining Visualizations 1 3.1 A tale of two (or three) cultures 1 3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1 3.2.1 Term Frequency 1 3.2.2 Word Associations 1 3.2.3 Word Networks 1 3.3 Simple Word Clusters: Hierarchical Dendrograms 1 3.4 Word Clouds: Overused but Effective 1 3.4.1 One Corpus Word Clouds 1 3.4.2 Comparing and Contrasting Corpora in Word Clouds 1 3.4.3 Polarized Tag Plot 1 3.5 Summary 1 Chapter 4: Sentiment Scoring 1 4.1 What is Sentiment Analysis? 1 4.2 Sentiment Scoring: Parlor Trick or Insightful? 1 4.3 Polarity: Simple Sentiment Scoring 1 4.3.1 Subjectivity Lexicons 1 4.3.2 Qdap’s Scoring for positive and negative word choice 1 4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1 4.4 Emoticons :) Dealing with these perplexing clues 1 4.4.1 Symbol-Based Emoticons Native to R 1 4.4.2 Punctuation Based Emoticons 1 4.4.3 Emoji 1 4.5 R’s Archived Sentiment Scoring Library 1 4.5 Sentiment the tidytext way 1 4.6 Airbnb.com Boston Wrap Up 1 4.7 Summary 1 Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1 5.1 What is clustering? 1 5.1.1 K Means Clustering 1 5.1.2 Spherical K Means Clustering 1 5.1.3 K Mediod Clustering 1 5.1.4 Evaluating the cluster approaches 1 5.2 Calculating & Exploring String Distance 1 5.2.1 What is string distance? 1 5.2.2 Fuzzy Matching-amatch, ain 1 5.2.3 Similarity Distances- stringdist, stringdistmatrix 1 5.3 LDA Topic Modeling Explained 1 5.3.2 Topic Modeling Case Study 1 5.3.2 LDA &LDAvis 1 5.4 Text to Vectors using “text2vec” 1 5.4.1 text2vec 1 5.5 Summary 1 Chapter 6: Document Classification: Finding Clickbait from Headlines 1 6.1 What is document classification? 1 6.2 Clickbait Case Study 1 6.2.2 Session & Data Set Up 1 6.2.3 GLMNET Training 1 6.2.4 GLMNET Test Predictions 1 6.2.5 Test Set Evaluation 1 6.2.6 Finding the most impactful words 1 6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1 6.3 Summary 1 Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1 7.1 Classification Vs Prediction 1 7.2 Case Study I: Will this patient come back to the hospital? 1 7.2.2 Patient Readmission in the Text Mining Workflow 1 7.2.3 Session & Data Set Up 1 7.2.4 Patient Modeling 1 7.2.5 More Model KPI: AUC, Recall, Precision & F1 1 7.2.5.1 Additional Evaluation Metrics 1 7.2.6 Apply the model to new patients 1 7.2.7 Patient Readmission Conclusion 1 7.3 Case Study II: Predicting Box Office Success 1 7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1 7.3.3 Session & Data Set Up 1 7.3.4 Opening Weekend Modeling 1 7.3.5 Model Evaluation 1 7.3.6 Apply the Model to new Movie Reviews 1 7.3.7 Movie Revenue Conclusion 1 7.4 Summary 1 Chapter 8: The OpenNLP Project 1 8.1 What is the OpenNLP project? 1 8.2 R’s OpenNLP Package 1 8.3 Named Entities in Hillary Clinton’s Email 1 8.3.1 R Session Set-up 1 8.3.2 Minor Text Cleaning 1 8.3.3 Using OpenNLP on a single email 1 8.3.4 Using OpenNLP on multiple documents 1 8.3.5 Revisiting the Text Mining Workflow 1 8.4 Analyzing the Named Entities 1 8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1 8.4.2 Mapping Only European Locations 1 8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1 8.4.4 Stock Charts for Entities 1 8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1 8.5 Summary 1 Chapter 9: Text Sources 1 9.1 Sourcing Text 1 9.2 Web Sources 1 9.2.1 Web Scraping a Single Page with rvest 1 9.2.2 Web Scraping Multiple Pages with rvest 1 9.2.3 Application Program Interfaces (APIs) 1 9.2.4 Newspaper Articles from The Guardian Newspaper 1 9.2.5 Tweets using the “twitteR” Package 1 9.2.6 Calling an API without a dedicated R package 1 9.2.7 Using jsonlite to access the New York Times 1 9.2.8 Using RCurl & XML to Parse Google News Feeds 1 9.2.9 The tm library Web-Mining Plugin 1 9.3 Getting Text from File Sources 1 9.3.1 Individual CSV, TXT and Microsoft Office Files 1 9.3.2 Reading multiple files quickly 1 9.3.2 Extracting Text from PDFs 1 9.3.3 Optical Character Recognition: Extracting Text from Images 1 9.4 Summary 1
£52.20
John Wiley & Sons Inc Data Science Strategy For Dummies
Book SynopsisAll the answers to your data science questions Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the what and the why of data science and covering what it takes to lead and nurture a top-notch team of data scientists. With this book, you'll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your real-life challenges as you uncover the stories and value hidden within data. Learn exactly what data science is and why it's importantAdopt a data-driven mindset as the foundation to successUnderstand the processes and common roadblocks behind data scienceKeep your data science program focused on generating business valueNurture a top-quality data science team In non-technical language, Data Science Strategy For Dummies outlTable of ContentsForeword xv Introduction 1 About This Book 2 Foolish Assumptions 3 How This Book is Organized 3 Icons Used In This Book 4 Beyond The Book 4 Where To Go From Here 5 Part 1: Optimizing Your Data Science Investment 7 Chapter 1: Framing Data Science Strategy 9 Establishing the Data Science Narrative 10 Capture 11 Maintain 12 Process 13 Analyze 14 Communicate 16 Actuate 17 Sorting Out the Concept of a Data-driven Organization 19 Approaching data-driven 20 Being data obsessed 21 Sorting Out the Concept of Machine Learning 22 Defining and Scoping a Data Science Strategy 26 Objectives 26 Approach 27 Choices 27 Data 27 Legal 28 Ethics 28 Competence 28 Infrastructure 29 Governance and security 29 Commercial/business models 30 Measurements 30 Chapter 2: Considering the Inherent Complexity in Data Science 31 Diagnosing Complexity in Data Science 32 Recognizing Complexity as a Potential 33 Enrolling in Data Science Pitfalls 101 34 Believing that all data is needed 34 Thinking that investing in a data lake will solve all your problems 35 Focusing on AI when analytics is enough 36 Believing in the 1-tool approach 37 Investing only in certain areas 37 Leveraging the infrastructure for reporting rather than exploration 38 Underestimating the need for skilled data scientists 39 `Navigating the Complexity 40 Chapter 3: Dealing with Difficult Challenges 41 Getting Data from There to Here 41 Handling dependencies on data owned by others 42 Managing data transfer and computation across-country borders 43 Managing Data Consistency Across the Data Science Environment 44 Securing Explainability in AI 45 Dealing with the Difference between Machine Learning and Traditional Software Programming 47 Managing the Rapid AI Technology Evolution and Lack of Standardization 50 Chapter 4: Managing Change in Data Science 51 Understanding Change Management in Data Science 52 Approaching Change in Data Science 53 Recognizing what to avoid when driving change in data science 56 Using Data Science Techniques to Drive Successful Change 59 Using digital engagement tools 59 Applying social media analytics to identify stakeholder sentiment 60 Capturing reference data in change projects 61 Using data to select people for change roles 61 Automating change metrics 62 Getting Started 62 Part 2: Making Strategic Choices for Your Data 65 Chapter 5: Understanding the Past, Present, and Future of Data 67 Sorting Out the Basics of Data 68 Explaining traditional data versus big data 69 Knowing the value of data 71 Exploring Current Trends in Data 73 Data monetization 73 Responsible AI 74 Cloud-based data architectures 75 Computation and intelligence in the edge 75 Digital twins 77 Blockchain 78 Conversational platforms 79 Elaborating on Some Future Scenarios 80 Standardization for data science productivity 80 From data monetization scenarios to a data economy 82 An explosion of human/machine hybrid systems 82 Quantum computing will solve the unsolvable problems 83 Chapter 6: Knowing Your Data 85 Selecting Your Data 85 Describing Data 87 Exploring Data 89 Assessing Data Quality 93 Improving Data Quality 95 Chapter 7: Considering the Ethical Aspects of Data Science 97 Explaining AI Ethics 98 Addressing trustworthy artificial intelligence 99 Introducing Ethics by Design 101 Chapter 8: Becoming Data-driven 103 Understanding Why Data-Driven is a Must 103 Transitioning to a Data-Driven Model 105 Securing management buy-in and assigning a chief data officer (CDO) 106 Identifying the key business value aligned with the business maturity 107 Developing a Data Strategy 108 Caring for your data 109 Democratizing the data 109 Driving data standardization 110 Structuring the data strategy 110 Establishing a Data-Driven Culture and Mindset 111 Chapter 9: Evolving from Data-driven to Machine-driven 113 Digitizing the Data 114 Applying a Data-driven Approach 115 Automating Workflows 116 Introducing AI/ML capabilities 116 Part 3: Building a Successful Data Science Organization 119 Chapter 10: Building Successful Data Science Teams 121 Starting with the Data Science Team Leader 121 Adopting different leadership approaches 122 Approaching data science leadership 124 Finding the right data science leader or manager 124 Defining the Prerequisites for a Successful Team 125 Developing a team structure 125 Establishing an infrastructure 126 Ensuring data availability 126 Insisting on interesting projects 127 Promoting continuous learning 127 Encouraging research studies 128 Building the Team 128 Developing smart hiring processes 129 Letting your teams evolve organically 130 Connecting the Team to the Business Purpose 131 Chapter 11: Approaching a Data Science Organizational Setup 133 Finding the Right Organizational Design 134 Designing the data science function 134 Evaluating the benefits of a center of excellence for data science 136 Identifying success factors for a data science center of excellence 137 Applying a Common Data Science Function 138 Selecting a location 138 Approaching ways of working 139 Managing expectations 141 Selecting an execution approach 142 Chapter 12: Positioning the Role of the Chief Data Officer (CDO) 145 Scoping the Role of the Chief Data Officer (CDO) 146 Explaining Why a Chief Data Officer is Needed 149 Establishing the CDO Role 150 The Future of the CDO Role 152 Chapter 13: Acquiring Resources and Competencies 155 Identifying the Roles in a Data Science Team 156 Data scientist 157 Data engineer 157 Machine learning engineer 158 Data architect 159 Business analyst 159 Software engineer 159 Domain expert 160 Seeing What Makes a Great Data Scientist 160 Structuring a Data Science Team 163 Hiring and evaluating the data science talent you need 165 Retaining Competence in Data Science 167 Understanding what makes a data scientist leave 169 Part 4: Investing in the Right Infrastructure 173 Chapter 14: Developing a Data Architecture 175 Defining What Makes Up a Data Architecture 176 Describing traditional architectural approaches 176 Elements of a data architecture 177 Exploring the Characteristics of a Modern Data Architecture 178 Explaining Data Architecture Layers 181 Listing the Essential Technologies for a Modern Data Architecture 184 NoSQL databases 184 Real-time streaming platforms 185 Docker and containers 185 Container repositories 186 Container orchestration 187 Microservices 187 Function as a service 188 Creating a Modern Data Architecture 189 Chapter 15: Focusing Data Governance on the Right Aspects 193 Sorting Out Data Governance 194 Data governance for defense or offense 195 Objectives for data governance 196 Explaining Why Data Governance is Needed 197 Data governance saves money 197 Bad data governance is dangerous 198 Good data governance provides clarity 198 Establishing Data Stewardship to Enforce Data Governance Rules 198 Implementing a Structured Approach to Data Governance 199 Chapter 16: Managing Models During Development and Production 203 Unfolding the Fundamentals of Model Management 203 Working with many models 204 Making the case for efficient model management 206 Implementing Model Management 207 Pinpointing implementation challenges 208 Managing model risk 210 Measuring the risk level 211 Identifying suitable control mechanisms 211 Chapter 17: Exploring the Importance of Open Source 213 Exploring the Role of Open Source 213 Understanding the importance of open source in smaller companies 214 Understanding the trend 215 Describing the Context of Data Science Programming Languages 215 Unfolding Open Source Frameworks for AI/ML Models 218 TensorFlow 219 Theano 219 Torch 219 Caffe and Caffe2 220 The Microsoft Cognitive Toolkit (previously known as Microsoft CNTK) 220 Keras 220 Scikit-learn 221 Spark MLlib 221 Azure ML Studio 221 Amazon Machine Learning 221 Choosing Open Source or Not? 222 Chapter 18: Realizing the Infrastructure 223 Approaching Infrastructure Realization 223 Listing Key Infrastructure Considerations for AI and ML Support 226 Location 226 Capacity 227 Data center setup 227 End-to-end management 227 Network infrastructure 228 Security and ethics 228 Advisory and supporting services 229 Ecosystem fit 229 Automating Workflows in Your Data Infrastructure 229 Enabling an Efficient Workspace for Data Engineers and Data Scientists 230 Part 5: Data as a Business 233 Chapter 19: Investing in Data as a Business 235 Exploring How to Monetize Data 236 Approaching data monetization is about treating data as an asset 237 Data monetization in a data economy 238 Looking to the Future of the Data Economy 240 Chapter 20: Using Data for Insights or Commercial Opportunities 243 Focusing Your Data Science Investment 243 Determining the Drivers for Internal Business Insights 244 Recognizing data science categories for practical implementation 245 Applying data-science-driven internal business insights 247 Using Data for Commercial Opportunities 248 Defining a data product 249 Distinguishing between categories of data products 250 Balancing Strategic Objectives 252 Chapter 21: Engaging Differently with Your Customers 255 Understanding Your Customers 255 Step 1: Engage your customers 256 Step 2: Identify what drives your customers 257 Step 3: Apply analytics and machine learning to customer actions 258 Step 4: Predict and prepare for the next step 259 Step 5: Imagine your customer’s future 260 Keeping Your Customers Happy 261 Serving Customers More Efficiently 263 Predicting demand 263 Automating tasks 264 Making company applications predictive 264 Chapter 22: Introducing Data-driven Business Models 265 Defining Business Models 265 Exploring Data-driven Business Models 267 Creating data-centric businesses 268 Investigating different types of data-driven business models 268 Using a Framework for Data-driven Business Models 275 Creating a data-driven business model using a framework 276 Key resources 277 Key activities 277 Offering/value proposition 278 Customer segment 278 Revenue model 279 Cost structure 280 Putting it all together 280 Chapter 23: Handling New Delivery Models 281 Defining Delivery Models for Data Products and Services 282 Understanding and Adapting to New Delivery Models 282 Introducing New Ways to Deliver Data Products 284 Self-service analytics environments as a delivery model 285 Applications, websites, and product/service interfaces as delivery models 287 Existing products and services 289 Downloadable files 290 APIs 290 Cloud services 291 Online market places 291 Downloadable licenses 292 Online services 293 Onsite services 293 Part 6: The Part of Tens 295 Chapter 24: Ten Reasons to Develop a Data Science Strategy 297 Expanding Your View on Data Science 297 Aligning the Company View 298 Creating a Solid Base for Execution 299 Realizing Priorities Early 299 Putting the Objective into Perspective 300 Creating an Excellent Base for Communication 300 Understanding Why Choices Matter 301 Identifying the Risks Early 301 Thoroughly Considering Your Data Need 302 Understanding the Change Impact 303 Chapter 25: Ten Mistakes to Avoid When Investing in Data Science 305 Don’t Tolerate Top Management’s Ignorance of Data Science 305 Don’t Believe That AI is Magic 306 Don’t Approach Data Science as a Race to the Death between Man and Machine 307 Don’t Underestimate the Potential of AI 308 Don’t Underestimate the Needed Data Science Skill Set 308 Don’t Think That a Dashboard is the End Objective 309 Don’t Forget about the Ethical Aspects of AI 310 Don’t Forget to Consider the Legal Rights to the Data 311 Don’t Ignore the Scale of Change Needed 312 Don’t Forget the Measurements Needed to Prove Value 313 Index 315
£22.09
John Wiley & Sons Inc OntologyBased Information Retrieval for
Book SynopsisWith the advancements of semantic web, ontology has become the crucial mechanism for representing concepts in various domains. For research and dispersal of customized healthcare services, a major challenge is to efficiently retrieve and analyze individual patient data from a large volume of heterogeneous data over a long time span. This requirement demands effective ontology-based information retrieval approaches for clinical information systems so that the pertinent information can be mined from large amount of distributed data. This unique and groundbreaking book highlights the key advances in ontology-based information retrieval techniques being applied in the healthcare domain and covers the following areas: Semantic data integration in e-health care systems Keyword-based medical information retrieval Ontology-based query retrieval support for e-health implementation Ontologies as a database management system technology for medicalTable of ContentsPreface xix Acknowledgment xxiii 1 Role of Ontology in Health Care 1Sonia Singla 1.1 Introduction 2 1.2 Ontology in Diabetes 3 1.2.1 Ontology Process 4 1.2.2 Impediments of the Present Investigation 5 1.3 Role of Ontology in Cardiovascular Diseases 6 1.4 Role of Ontology in Parkinson Diseases 8 1.4.1 The Spread of Disease With Age and Onset of Disease 10 1.4.2 Cost of PD for Health Care, Household 11 1.4.3 Treatment and Medicines 11 1.5 Role of Ontology in Depression 13 1.6 Conclusion 15 1.7 Future Scope 15 References 15 2 A Study on Basal Ganglia Circuit and Its Relation With Movement Disorders 19Dinesh Bhatia 2.1 Introduction 19 2.2 Anatomy and Functioning of Basal Ganglia 21 2.2.1 The Striatum-Major Entrance to Basal Ganglia Circuitry 22 2.2.2 Direct and Indirect Striatofugal Projections 23 2.2.3 The STN: Another Entrance to Basal Ganglia Circuitry 25 2.3 Movement Disorders 26 2.3.1 Parkinson Disease 26 2.3.2 Dyskinetic Disorder 27 2.3.3 Dystonia 28 2.4 Effect of Basal Ganglia Dysfunctioning on Movement Disorders 29 2.5 Conclusion and Future Scope 31 References 31 3 Extraction of Significant Association Rules Using Pre- and Post-Mining Techniques—An Analysis 37M. Nandhini and S. N. Sivanandam 3.1 Introduction 38 3.2 Background 39 3.2.1 Interestingness Measures 39 3.2.2 Pre-Mining Techniques 40 3.2.2.1 Candidate Set Reduction Schemes 40 3.2.2.2 Optimal Threshold Computation Schemes 41 3.2.2.3 Weight-Based Mining Schemes 42 3.2.3 Post-Mining Techniques 42 3.2.3.1 Rule Pruning Schemes 43 3.2.3.2 Schemes Using Knowledge Base 43 3.3 Methodology 44 3.3.1 Data Preprocessing 44 3.3.2 Pre-Mining 46 3.3.2.1 Pre-Mining Technique 1: Optimal Support and Confidence Threshold Value Computation Using PSO 46 3.3.2.2 Pre-Mining Technique 2: Attribute Weight Computation Using IG Measure 48 3.3.3 Association Rule Generation 50 3.3.3.1 ARM Preliminaries 50 3.3.3.2 WARM Preliminaries 52 3.3.4 Post-Mining 56 3.3.4.1 Filters 56 3.3.4.2 Operators 58 3.3.4.3 Rule Schemas 58 3.4 Experiments and Results 59 3.4.1 Parameter Settings for PSO-Based Pre-Mining Technique 60 3.4.2 Parameter Settings for PAW-Based Pre-Mining Technique 60 3.5 Conclusions 63 References 65 4 Ontology in Medicine as a Database Management System 69Shobowale K. O. 4.1 Introduction 70 4.1.1 Ontology Engineering and Development Methodology 72 4.2 Literature Review on Medical Data Processing 72 4.3 Information on Medical Ontology 75 4.3.1 Types of Medical Ontology 75 4.3.2 Knowledge Representation 76 4.3.3 Methodology of Developing Medical Ontology 76 4.3.4 Medical Ontology Standards 77 4.4 Ontologies as a Knowledge-Based System 78 4.4.1 Domain Ontology in Medicine 79 4.4.2 Brief Introduction of Some Medical Standards 81 4.4.2.1 Medical Subject Headings (MeSH) 81 4.4.2.2 Medical Dictionary for Regulatory Activities (MedDRA) 81 4.4.2.3 Medical Entities Dictionary (MED) 81 4.4.3 Reusing Medical Ontology 82 4.4.4 Ontology Evaluation 85 4.5 Conclusion 86 4.6 Future Scope 86 References 87 5 Using IoT and Semantic Web Technologies for Healthcare and Medical Sector 91Nikita Malik and Sanjay Kumar Malik 5.1 Introduction 92 5.1.1 Significance of Healthcare and Medical Sector and Its Digitization 92 5.1.2 e-Health and m-Health 92 5.1.3 Internet of Things and Its Use 94 5.1.4 Semantic Web and Its Technologies 96 5.2 Use of IoT in Healthcare and Medical Domain 98 5.2.1 Scope of IoT in Healthcare and Medical Sector 98 5.2.2 Benefits of IoT in Healthcare and Medical Systems 100 5.2.3 IoT Healthcare Challenges and Open Issues 100 5.3 Role of SWTs in Healthcare Services 101 5.3.1 Scope and Benefits of Incorporating Semantics in Healthcare 101 5.3.2 Ontologies and Datasets for Healthcare and Medical Domain 103 5.3.3 Challenges in the Use of SWTs in Healthcare Sector 104 5.4 Incorporating IoT and/or SWTs in Healthcare and Medical Sector 106 5.4.1 Proposed Architecture or Framework or Model 106 5.4.2 Access Mechanisms or Approaches 108 5.4.3 Applications or Systems 109 5.5 Healthcare Data Analytics Using Data Mining and Machine Learning 110 5.6 Conclusion 112 5.7 Future Work 113 References 113 6 An Ontological Model, Design, and Implementation of CSPF for Healthcare 117Pooja Mohan 6.1 Introduction 117 6.2 Related Work 119 6.3 Mathematical Representation of CSPF Model 122 6.3.1 Basic Sets of CSPF Model 123 6.3.2 Conditional Contextual Security and Privacy Constraints 123 6.3.3 CSPF Model States CsetofStates 124 6.3.4 Permission Cpermission 124 6.3.5 Security Evaluation Function (SEFcontexts) 124 6.3.6 Secure State 125 6.3.7 CSPF Model Operations 125 6.3.7.1 Administrative Operations 125 6.3.7.2 Users’ Operations 127 6.4 Ontological Model 127 6.4.1 Development of Class Hierarchy 127 6.4.1.1 Object Properties of Sensor Class 129 6.4.1.2 Data Properties 129 6.4.1.3 The Individuals 129 6.5 The Design of Context-Aware Security and Privacy Model for Wireless Sensor Network 129 6.6 Implementation 133 6.7 Analysis and Results 135 6.7.1 Inference Time/Latency/Query Response Time vs. No. of Policies 135 6.7.2 Average Inference Time vs. Contexts 136 6.8 Conclusion and Future Scope 137 References 138 7 Ontology-Based Query Retrieval Support for E-Health Implementation 143Aatif Ahmad Khan and Sanjay Kumar Malik 7.1 Introduction 143 7.1.1 Health Care Record Management 144 7.1.1.1 Electronic Health Record 144 7.1.1.2 Electronic Medical Record 145 7.1.1.3 Picture Archiving and Communication System 145 7.1.1.4 Pharmacy Systems 145 7.1.2 Information Retrieval 145 7.1.3 Ontology 146 7.2 Ontology-Based Query Retrieval Support 146 7.3 E-Health 150 7.3.1 Objectives and Scope 150 7.3.2 Benefits of E-Health 151 7.3.3 E-Health Implementation 151 7.4 Ontology-Driven Information Retrieval for E-Health 154 7.4.1 Ontology for E-Heath Implementation 155 7.4.2 Frameworks for Information Retrieval Using Ontology for E-Health 157 7.4.3 Applications of Ontology-Driven Information Retrieval in Health Care 158 7.4.4 Benefits and Limitations 160 7.5 Discussion 160 7.6 Conclusion 164 References 164 8 Ontology-Based Case Retrieval in an E-Mental Health Intelligent Information System 167Georgia Kaoura, Konstantinos Kovas and Basilis Boutsinas 8.1 Introduction 167 8.2 Literature Survey 170 8.3 Problem Identified 173 8.4 Proposed Solution 174 8.4.1 The PAVEFS Ontology 174 8.4.2 Knowledge Base 179 8.4.3 Reasoning 180 8.4.4 User Interaction 182 8.5 Pros and Cons of Solution 183 8.5.1 Evaluation Methodology and Results 183 8.5.2 Evaluation Methodology 185 8.5.2.1 Evaluation Tools 186 8.5.2.2 Results 187 8.6 Conclusions 189 8.7 Future Scope 190 References 190 9 Ontology Engineering Applications in Medical Domain 193Mariam Gawich and Marco Alfonse 9.1 Introduction 193 9.2 Ontology Activities 195 9.2.1 Ontology Learning 195 9.2.2 Ontology Matching 195 9.2.3 Ontology Merging (Unification) 195 9.2.4 Ontology Validation 196 9.2.5 Ontology Verification 196 9.2.6 Ontology Alignment 196 9.2.7 Ontology Annotation 196 9.2.8 Ontology Evaluation 196 9.2.9 Ontology Evolution 196 9.3 Ontology Development Methodologies 197 9.3.1 TOVE 197 9.3.2 Methontology 198 9.3.3 Brusa et al. Methodology 198 9.3.4 UPON Methodology 199 9.3.5 Uschold and King Methodology 200 9.4 Ontology Languages 203 9.4.1 RDF-RDF Schema 203 9.4.2 OWL 205 9.4.3 OWL 2 205 9.5 Ontology Tools 208 9.5.1 Apollo 208 9.5.2 NeON 209 9.5.3 Protégé 210 9.6 Ontology Engineering Applications in Medical Domain 212 9.6.1 Ontology-Based Decision Support System (DSS) 213 9.6.1.1 OntoDiabetic 213 9.6.1.2 Ontology-Based CDSS for Diabetes Diagnosis 214 9.6.1.3 Ontology-Based Medical DSS within E-Care Telemonitoring Platform 215 9.6.2 Medical Ontology in the Dynamic Healthcare Environment 216 9.6.3 Knowledge Management Systems 217 9.6.3.1 Ontology-Based System for Cancer Diseases 217 9.6.3.2 Personalized Care System for Chronic Patients at Home 218 9.7 Ontology Engineering Applications in Other Domains 219 9.7.1 Ontology Engineering Applications in E-Commerce 219 9.7.1.1 Automated Approach to Product Taxonomy Mapping in E-Commerce 219 9.7.1.2 LexOnt Matching Approach 221 9.7.2 Ontology Engineering Applications in Social Media Domain 222 9.7.2.1 Emotive Ontology Approach 222 9.7.2.2 Ontology-Based Approach for Social Media Analysis 224 9.7.2.3 Methodological Framework for Semantic Comparison of Emotional Values 225 References 226 10 Ontologies on Biomedical Informatics 233Marco Alfonse and Mariam Gawich 10.1 Introduction 233 10.2 Defining Ontology 234 10.3 Biomedical Ontologies and Ontology-Based Systems 235 10.3.1 MetaMap 235 10.3.2 GALEN 236 10.3.3 NIH-CDE 236 10.3.4 LOINC 237 10.3.5 Current Procedural Terminology (CPT) 238 10.3.6 Medline Plus Connect 238 10.3.7 Gene Ontology 239 10.3.8 UMLS 240 10.3.9 SNOMED-CT 240 10.3.10 OBO Foundry 240 10.3.11 Textpresso 240 10.3.12 National Cancer Institute Thesaurus 241 References 241 11 Machine Learning Techniques Best for Large Data Prediction: A Case Study of Breast Cancer Categorical Data: k-Nearest Neighbors 245Yagyanath Rimal 11.1 Introduction 246 11.2 R Programming 250 11.3 Conclusion 255 References 255 12 Need of Ontology-Based Systems in Healthcare System 257Tshepiso Larona Mokgetse 12.1 Introduction 258 12.2 What is Ontology? 259 12.3 Need for Ontology in Healthcare Systems 260 12.3.1 Primary Healthcare 262 12.3.1.1 Semantic Web System 262 12.3.2 Emergency Services 263 12.3.2.1 Service-Oriented Architecture 263 12.3.2.2 IOT Ontology 264 12.3.3 Public Healthcare 265 12.3.3.1 IOT Data Model 265 12.3.4 Chronic Disease Healthcare 266 12.3.4.1 Clinical Reminder System 266 12.3.4.2 Chronic Care Model 267 12.3.5 Specialized Healthcare 268 12.3.5.1 E-Health Record System 268 12.3.5.2 Maternal and Child Health 269 12.3.6 Cardiovascular System 270 12.3.6.1 Distributed Healthcare System 270 12.3.6.2 Records Management System 270 12.3.7 Stroke Rehabilitation 271 12.3.7.1 Patient Information System 271 12.3.7.2 Toronto Virtual System 271 12.4 Conclusion 272 References 272 13 Exploration of Information Retrieval Approaches With Focus on Medical Information Retrieval 275Mamata Rath and Jyotir Moy Chatterjee 13.1 Introduction 276 13.1.1 Machine Learning-Based Medical Information System 278 13.1.2 Cognitive Information Retrieval 278 13.2 Review of Literature 279 13.3 Cognitive Methods of IR 281 13.4 Cognitive and Interactive IR Systems 286 13.5 Conclusion 288 References 289 14 Ontology as a Tool to Enable Health Internet of Things Viable 5G Communication Networks 293Nidhi Sharma and R. K. Aggarwal 14.1 Introduction 293 14.2 From Concept Representations to Medical Ontologies 295 14.2.1 Current Medical Research Trends 296 14.2.2 Ontology as a Paradigm Shift in Health Informatics 296 14.3 Primer Literature Review 297 14.3.1 Remote Health Monitoring 298 14.3.2 Collecting and Understanding Medical Data 298 14.3.3 Patient Monitoring 298 14.3.4 Tele-Health 299 14.3.5 Advanced Human Services Records Frameworks 299 14.3.6 Applied Autonomy and Healthcare Mechanization 300 14.3.7 IoT Powers the Preventive Healthcare 301 14.3.8 Hospital Statistics Control System (HSCS) 301 14.3.9 End-to-End Accessibility and Moderateness 301 14.3.10 Information Mixing and Assessment 302 14.3.11 Following and Alerts 302 14.3.12 Remote Remedial Assistance 302 14.4 Establishments of Health IoT 303 14.4.1 Technological Challenges 304 14.4.2 Probable Solutions 306 14.4.3 Bit-by-Bit Action Statements 307 14.5 Incubation of IoT in Health Industry 307 14.5.1 Hearables 308 14.5.2 Ingestible Sensors 308 14.5.3 Moodables 308 14.5.4 PC Vision Innovation 308 14.5.5 Social Insurance Outlining 308 14.6 Concluding Remarks 309 References 309 15 Tools and Techniques for Streaming Data: An Overview 313K. Saranya, S. Chellammal and Pethuru Raj Chelliah 15.1 Introduction 314 15.2 Traditional Techniques 315 15.2.1 Random Sampling 315 15.2.2 Histograms 316 15.2.3 Sliding Window 316 15.2.4 Sketches 317 15.2.4.1 Bloom Filters 317 15.2.4.2 Count-Min Sketch 317 15.3 Data Mining Techniques 317 15.3.1 Clustering 318 15.3.1.1 STREAM 318 15.3.1.2 BRICH 318 15.3.1.3 CLUSTREAM 319 15.3.2 Classification 319 15.3.2.1 Naïve Bayesian 319 15.3.2.2 Hoeffding 320 15.3.2.3 Very Fast Decision Tree 320 15.3.2.4 Concept Adaptive Very Fast Decision Tree 320 15.4 Big Data Platforms 320 15.4.1 Apache Storm 321 15.4.2 Apache Spark 321 15.4.2.1 Apache Spark Core 321 15.4.2.2 Spark SQL 322 15.4.2.3 Machine Learning Library 322 15.4.2.4 Streaming Data API 322 15.4.2.5 GraphX 323 15.4.3 Apache Flume 323 15.4.4 Apache Kafka 323 15.4.5 Apache Flink 326 15.5 Conclusion 327 References 328 16 An Ontology-Based IR for Health Care 331J. P. Patra, Gurudatta Verma and Sumitra Samal 16.1 Introduction 331 16.2 General Definition of Information Retrieval Model 333 16.3 Information Retrieval Model Based on Ontology 334 16.4 Literature Survey 336 16.5 Methodolgy for IR 339 References 344
£164.66
John Wiley & Sons Inc Computation in BioInformatics
Book SynopsisCOMPUTATION IN BIOINFORMATICS Bioinformatics is a platform between the biology and information technology and this book provides readers with an understanding of the use of bioinformatics tools in new drug design. The discovery of new solutions to pandemics is facilitated through the use of promising bioinformatics techniques and integrated approaches. This book covers a broad spectrum of the bioinformatics field, starting with the basic principles, concepts, and application areas. Also covered is the role of bioinformatics in drug design and discovery, including aspects of molecular modeling. Some of the chapters provide detailed information on bioinformatics related topics, such as silicon design, protein modeling, DNA microarray analysis, DNA-RNA barcoding, and gene sequencing, all of which are currently needed in the industry. Also included are specialized topics, such as bioinformatics in cancer detection, genomics, and proteomics. Moreover, a few chapters explTable of ContentsPreface xiii 1 Bioinfomatics as a Tool in Drug Designing 1Rene Barbie Browne, Shiny C. Thomas and Jayanti Datta Roy 1.1 Introduction 1 1.2 Steps Involved in Drug Designing 3 1.2.1 Identification of the Target Protein/Enzyme 5 1.2.2 Detection of Molecular Site (Active Site) in the Target Protein 6 1.2.3 Molecular Modeling 6 1.2.4 Virtual Screening 9 1.2.5 Molecular Docking 10 1.2.6 QSAR (Quantitative Structure-Activity Relationship) 12 1.2.7 Pharmacophore Modeling 14 1.2.8 Solubility of Molecule 14 1.2.9 Molecular Dynamic Simulation 14 1.2.10 ADME Prediction 15 1.3 Various Softwares Used in the Steps of Drug Designing 16 1.4 Applications 18 1.5 Conclusion 20 References 20 2 New Strategies in Drug Discovery 25Vivek Chavda, Yogita Thalkari and Swati Marwadi 2.1 Introduction 26 2.2 Road Toward Advancement 27 2.3 Methodology 30 2.3.1 Target Identification 30 2.3.2 Docking-Based Virtual Screening 32 2.3.3 Conformation Sampling 33 2.3.4 Scoring Function 34 2.3.5 Molecular Similarity Methods 35 2.3.6 Virtual Library Construction 37 2.3.7 Sequence-Based Drug Design 37 2.4 Role of OMICS Technology 38 2.5 High-Throughput Screening and Its Tools 40 2.6 Chemoinformatic 44 2.6.1 Exploratory Data Analysis 45 2.6.2 Example Discovery 46 2.6.3 Pattern Explanation 46 2.6.4 New Technologies 46 2.7 Concluding Remarks and Future Prospects 46 References 48 3 Role of Bioinformatics in Early Drug Discovery: An Overview and Perspective 49Shasank S. Swain and Tahziba Hussain 3.1 Introduction 50 3.2 Bioinformatics and Drug Discovery 51 3.2.1 Structure-Based Drug Design (SBDD) 52 3.2.2 Ligand-Based Drug Design (LBDD) 53 3.3 Bioinformatics Tools in Early Drug Discovery 54 3.3.1 Possible Biological Activity Prediction Tools 55 3.3.2 Possible Physicochemical and Drug-Likeness Properties Verification Tools 58 3.3.3 Possible Toxicity and ADME/T Profile Prediction Tools 60 3.4 Future Directions With Bioinformatics Tool 61 3.5 Conclusion 63 Acknowledgements 64 References 64 4 Role of Data Mining in Bioinformatics 69Vivek P. Chavda, Amit Sorathiya, Disha Valu and Swati Marwadi 4.1 Introduction 70 4.2 Data Mining Methods/Techniques 71 4.2.1 Classification 71 4.2.1.1 Statistical Techniques 71 4.2.1.2 Clustering Technique 73 4.2.1.3 Visualization 74 4.2.1.4 Induction Decision Tree Technique 74 4.2.1.5 Neural Network 75 4.2.1.6 Association Rule Technique 75 4.2.1.7 Classification 75 4.3 DNA Data Analysis 77 4.4 RNA Data Analysis 79 4.5 Protein Data Analysis 79 4.6 Biomedical Data Analysis 80 4.7 Conclusion and Future Prospects 81 References 81 5 In Silico Protein Design and Virtual Screening 85Vivek P. Chavda, Zeel Patel, Yashti Parmar and Disha Chavda 5.1 Introduction 86 5.2 Virtual Screening Process 88 5.2.1 Before Virtual Screening 90 5.2.2 General Process of Virtual Screening 90 5.2.2.1 Step 1 (The Establishment of the Receptor Model) 91 5.2.2.2 Step 2 (The Generation of Small-Molecule Libraries) 92 5.2.2.3 Step 3 (Molecular Docking) 92 5.2.2.4 Step 4 (Selection of Lead Protein Compounds) 94 5.3 Machine Learning and Scoring Functions 94 5.4 Conclusion and Future Prospects 95 References 96 6 New Bioinformatics Platform-Based Approach for Drug Design 101Vivek Chavda, Soham Sheta, Divyesh Changani and Disha Chavda 6.1 Introduction 102 6.2 Platform-Based Approach and Regulatory Perspective 104 6.3 Bioinformatics Tools and Computer-Aided Drug Design 107 6.4 Target Identification 109 6.5 Target Validation 110 6.6 Lead Identification and Optimization 111 6.7 High-Throughput Methods (HTM) 112 6.8 Conclusion and Future Prospects 114 References 115 7 Bioinformatics and Its Application Areas 121Ragini Bhardwaj, Mohit Sharma and Nikhil Agrawal 7.1 Introduction 121 7.2 Review of Bioinformatics 124 7.3 Bioinformatics Applications in Different Areas 126 7.3.1 Microbial Genome Application 126 7.3.2 Molecular Medicine 129 7.3.3 Agriculture 130 7.4 Conclusion 131 References 131 8 DNA Microarray Analysis: From Affymetrix CEL Files to Comparative Gene Expression 139Sandeep Kumar, Shruti Shandilya, Suman Kapila, Mohit Sharma and Nikhil Agrawal 8.1 Introduction 140 8.2 Data Processing 140 8.2.1 Installation of Workflow 140 8.2.2 Importing the Raw Data for Processing 141 8.2.3 Retrieving Sample Annotation of the Data 142 8.2.4 Quality Control 143 8.2.4.1 Boxplot 144 8.2.4.2 Density Histogram 145 8.2.4.3 MA Plot 145 8.2.4.4 NUSE Plot 145 8.2.4.5 RLE Plot 145 8.2.4.6 RNA Degradation Plot 145 8.2.4.7 QCstat 148 8.3 Normalization of Microarray Data Using the RMA Method 148 8.3.1 Background Correction 148 8.3.2 Normalization 149 8.3.3 Summarization 149 8.4 Statistical Analysis for Differential Gene Expression 151 8.5 Conclusion 153 References 153 9 Machine Learning in Bioinformatics 155Rahul Yadav, Mohit Sharma and Nikhil Agrawal 9.1 Introduction and Background 156 9.1.1 Bioinformatics 158 9.1.2 Text Mining 159 9.1.3 IoT Devices 159 9.2 Machine Learning Applications in Bioinformatics 159 9.3 Machine Learning Approaches 161 9.4 Conclusion and Closing Remarks 162 References 162 10 DNA-RNA Barcoding and Gene Sequencing 165Gifty Sawhney, Mohit Sharma and Nikhil Agrawal 10.1 Introduction 166 10.2 RNA 169 10.3 DNA Barcoding 172 10.3.1 Introduction 172 10.3.2 DNA Barcoding and Molecular Phylogeny 177 10.3.3 Ribosomal DNA (rDNA) of the Nuclear Genome (nuDNA)—ITS 178 10.3.4 Chloroplast DNA 180 10.3.5 Mitochondrial DNA 181 10.3.6 Molecular Phylogenetic Analysis 181 10.3.7 Metabarcoding 189 10.3.8 Materials for DNA Barcoding 190 10.4 Main Reasons of DNA Barcoding 191 10.5 Limitations/Restrictions of DNA Barcoding 192 10.6 RNA Barcoding 192 10.6.1 Overview of the Method 193 10.7 Methodology 194 10.7.1 Materials Required 195 10.7.2 Barcoded RNA Sequencing High-Level Mapping of Single-Neuron Projections 196 10.7.3 Using RNA to Trace Neurons 196 10.7.4 A Life Conservation Barcoder 198 10.7.5 Gene Sequencing 199 10.7.5.1 DNA Sequencing Methods 200 10.7.5.2 First-Generation Sequencing Techniques 204 10.7.5.3 Maxam’s and Gilbert’s Chemical Method 204 10.7.5.4 Sanger Sequencing 205 10.7.5.5 Automation in DNA Sequencing 206 10.7.5.6 Use of Fluorescent-Marked Primers and ddNTPs 206 10.7.5.7 Dye Terminator Sequencing 207 10.7.5.8 Using Capillary Electrophoresis 207 10.7.6 Developments and High-Throughput Methods in DNA Sequencing 208 10.7.7 Pyrosequencing Method 209 10.7.8 The Genome Sequencer 454 FLX System 210 10.7.9 Illumina/Solexa Genome Analyzer 210 10.7.10 Transition Sequencing Techniques 211 10.7.11 Ion-Torrent’s Semiconductor Sequencing 211 10.7.12 Helico’s Genetic Analysis Platform 211 10.7.13 Third-Generation Sequencing Techniques 212 10.8 Conclusion 212 Abbreviations 213 Acknowledgement 214 References 214 11 Bioinformatics in Cancer Detection 229Mohit Sharma, Umme Abiha, Parul Chugh, Balakumar Chandrasekaran and Nikhil Agrawal 11.1 Introduction 230 11.2 The Era of Bioinformatics in Cancer 230 11.3 Aid in Cancer Research via NCI 232 11.4 Application of Big Data in Developing Precision Medicine 233 11.5 Historical Perspective and Development 235 11.6 Bioinformatics-Based Approaches in the Study of Cancer 237 11.6.1 SLAMS 237 11.6.2 Module Maps 238 11.6.3 COPA 239 11.7 Conclusion and Future Challenges 240 References 240 12 Genomic Association of Polycystic Ovarian Syndrome: Single-Nucleotide Polymorphisms and Their Role in Disease Progression 245Gowtham Kumar Subbaraj and Sindhu Varghese 12.1 Introduction 246 12.2 FSHR Gene 252 12.3 IL-10 Gene 252 12.4 IRS-1 Gene 253 12.5 PCR Primers Used 254 12.6 Statistical Analysis 255 12.7 Conclusion 258 References 259 13 An Insight of Protein Structure Predictions Using Homology Modeling 265S. Muthumanickam, P. Boomi, R. Subashkumar, S. Palanisamy, A. Sudha, K. Anand, C. Balakumar, M. Saravanan, G. Poorani, Yao Wang, K. Vijayakumar and M. Syed Ali 13.1 Introduction 266 13.2 Homology Modeling Approach 268 13.2.1 Strategies for Homology Modeling 269 13.2.2 Procedure 269 13.3 Steps Involved in Homology Modeling 270 13.3.1 Template Identification 270 13.3.2 Sequence Alignment 271 13.3.3 Backbone Generation 271 13.3.4 Loop Modeling 271 13.3.5 Side Chain Modeling 272 13.3.6 Model Optimization 272 13.3.6.1 Model Validation 272 13.4 Tools Used for Homology Modeling 273 13.4.1 Robetta 273 13.4.2 M4T (Multiple Templates) 273 13.4.3 I-Tasser (Iterative Implementation of the Threading Assembly Refinement) 273 13.4.4 ModBase 274 13.4.5 Swiss Model 274 13.4.6 PHYRE2 (Protein Homology/Analogy Recognition Engine 2) 274 13.4.7 Modeller 274 13.4.8 Conclusion 275 Acknowledgement 275 References 275 14 Basic Concepts in Proteomics and Applications 279Jesudass Joseph Sahayarayan, A.S. Enogochitra and Murugesan Chandrasekaran 14.1 Introduction 280 14.2 Challenges on Proteomics 281 14.3 Proteomics Based on Gel 283 14.4 Non-Gel–Based Electrophoresis Method 284 14.5 Chromatography 284 14.6 Proteomics Based on Peptides 285 14.7 Stable Isotopic Labeling 286 14.8 Data Mining and Informatics 287 14.9 Applications of Proteomics 289 14.10 Future Scope 290 14.11 Conclusion 291 References 292 15 Prospects of Covalent Approaches in Drug Discovery: An Overview 295Balajee Ramachandran, Saravanan Muthupandian and Jeyakanthan Jeyaraman 15.1 Introduction 296 15.2 Covalent Inhibitors Against the Biological Target 297 15.3 Application of Physical Chemistry Concepts in Drug Designing 299 15.4 Docking Methodologies—An Overview 301 15.5 Importance of Covalent Targets 302 15.6 Recent Framework on the Existing Docking Protocols 303 15.7 SN2 Reactions in the Computational Approaches 304 15.8 Other Crucial Factors to Consider in the Covalent Docking 305 15.8.1 Role of Ionizable Residues 305 15.8.2 Charge Regulation 306 15.8.3 Charge-Charge Interactions 306 15.9 QM/MM Approaches 309 15.10 Conclusion and Remarks 310 Acknowledgements 311 References 311 Index 321
£138.56
John Wiley & Sons Inc Machine Learning for Time Series Forecasting with
Book SynopsisLearn how to apply the principles of machine learning totime series modeling with thisindispensableresource Machine Learning for Time Series Forecasting with Pythonis an incisive and straightforward examination of one of the most crucial elements of decision-makingin finance,marketing,education, and healthcare:time series modeling. Despitethe centrality of time series forecasting, few business analysts are familiar with the power or utility of applying machine learning to time series modeling. Author Francesca Lazzeri, a distinguishedmachine learning scientistandeconomist,corrects that deficiency by providing readers withcomprehensiveand approachableexplanation andtreatment of the applicationof machine learning to time series forecasting. Written for readers who have little to no experience in time seriesforecastingor machine learning, the book comprehensively coversall the topics necessary to: Understand time series forecasting concepts, such asstationarity,horizon,trend,and seasonalityPrepare time series dataformodelingEvaluatetime series forecasting models'performance and accuracyUnderstand when to use neural networks instead of traditional time series models in time series forecasting Machine Learning for Time Series Forecasting with Pythonis fullreal-world examples, resourcesand concrete strategies to help readers explore and transform data and develop usable, practical time series forecasts. Perfect for entry-level data scientists, business analysts,developers, and researchers, this book is an invaluable and indispensable guide to the fundamental and advanced concepts of machine learning applied to time series modeling. Table of ContentsAcknowledgments vii Introduction xv Chapter 1 Overview of Time Series Forecasting 1 Flavors of Machine Learning for Time Series Forecasting 3 Supervised Learning for Time Series Forecasting 14 Python for Time Series Forecasting 21 Experimental Setup for Time Series Forecasting 24 Conclusion 26 Chapter 2 How to Design an End-to-End Time Series Forecasting Solution on the Cloud 29 Time Series Forecasting Template 31 Business Understanding and Performance Metrics 33 Data Ingestion 36 Data Exploration and Understanding 39 Data Pre-processing and Feature Engineering 40 Modeling Building and Selection 42 An Overview of Demand Forecasting Modeling Techniques 44 Model Evaluation 46 Model Deployment 48 Forecasting Solution Acceptance 53 Use Case: Demand Forecasting 54 Conclusion 58 Chapter 3 Time Series Data Preparation 61 Python for Time Series Data 62 Common Data Preparation Operations for Time Series 65 Time stamps vs. Periods 66 Converting to Timestamps 69 Providing a Format Argument 70 Indexing 71 Time/Date Components 76 Frequency Conversion 78 Time Series Exploration and Understanding 79 How to Get Started with Time Series Data Analysis 79 Data Cleaning of Missing Values in the Time Series 84 Time Series Data Normalization and Standardization 86 Time Series Feature Engineering 89 Date Time Features 90 Lag Features and Window Features 92 Rolling Window Statistics 95 Expanding Window Statistics 97 Conclusion 98 Chapter 4 Introduction to Autoregressive and Automated Methods for Time Series Forecasting 101 Autoregression 102 Moving Average 119 Autoregressive Moving Average 120 Autoregressive Integrated Moving Average 122 Automated Machine Learning 129 Conclusion 136 Chapter 5 Introduction to Neural Networks for Time Series Forecasting 137 Reasons to Add Deep Learning to Your Time Series Toolkit 138 Deep Learning Neural Networks Are Capable of Automatically Learning and Extracting Features from Raw and Imperfect Data 140 Deep Learning Supports Multiple Inputs and Outputs 142 Recurrent Neural Networks Are Good at Extracting Patterns from Input Data 143 Recurrent Neural Networks for Time Series Forecasting 144 Recurrent Neural Networks 145 Long Short-Term Memory 147 Gated Recurrent Unit 148 How to Prepare Time Series Data for LSTMs and GRUs 150 How to Develop GRUs and LSTMs for Time Series Forecasting 154 Keras 155 TensorFlow 156 Univariate Models 156 Multivariate Models 160 Conclusion 164 Chapter 6 Model Deployment for Time Series Forecasting 167 Experimental Set Up and Introduction to Azure Machine Learning SDK for Python 168 Workspace 169 Experiment 169 Run 169 Model 170 Compute Target, RunConfiguration, and ScriptRun Config 171 Image and Webservice 172 Machine Learning Model Deployment 173 How to Select the Right Tools to Succeed with Model Deployment 175 Solution Architecture for Time Series Forecasting with Deployment Examples 177 Train and Deploy an ARIMA Model 179 Configure the Workspace 182 Create an Experiment 183 Create or Attach a Compute Cluster 184 Upload the Data to Azure 184 Create an Estimator 188 Submit the Job to the Remote Cluster 188 Register the Model 189 Deployment 189 Define Your Entry Script and Dependencies 190 Automatic Schema Generation 191 Conclusion 196 References 197 Index 199
£35.62
John Wiley & Sons Inc Smarter Data Science
Book SynopsisOrganizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments. When an organization manages its data effectively, its data science program becomes a fully scalaTable of ContentsForeword for Smarter Data Science xix Epigraph xxi Preamble xxiii Chapter 1 Climbing the AI Ladder 1 Readying Data for AI 2 Technology Focus Areas 3 Taking the Ladder Rung by Rung 4 Constantly Adapt to Retain Organizational Relevance 8 Data-Based Reasoning is Part and Parcel in the Modern Business 10 Toward the AI-Centric Organization 14 Summary 16 Chapter 2 Framing Part I: Considerations for Organizations Using AI 17 Data-Driven Decision-Making 18 Using Interrogatives to Gain Insight 19 The Trust Matrix 20 The Importance of Metrics and Human Insight 22 Democratizing Data and Data Science 23 Aye, a Prerequisite: Organizing Data Must Be a Forethought 26 Preventing Design Pitfalls 27 Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29 Quae Quaestio (Question Everything) 30 Summary 32 Chapter 3 Framing Part II: Considerations for Working with Data and AI 35 Personalizing the Data Experience for Every User 36 Context Counts: Choosing the Right Way to Display Data 38 Ethnography: Improving Understanding Through Specialized Data 42 Data Governance and Data Quality 43 The Value of Decomposing Data 43 Providing Structure Through Data Governance 43 Curating Data for Training 45 Additional Considerations for Creating Value 45 Ontologies: A Means for Encapsulating Knowledge 46 Fairness, Trust, and Transparency in AI Outcomes 49 Accessible, Accurate, Curated, and Organized 52 Summary 54 Chapter 4 A Look Back on Analytics: More Than One Hammer 57 Been Here Before: Reviewing the Enterprise Data Warehouse 57 Drawbacks of the Traditional Data Warehouse 64 Paradigm Shift 68 Modern Analytical Environments: The Data Lake 69 By Contrast 71 Indigenous Data 72 Attributes of Difference 73 Elements of the Data Lake 75 The New Normal: Big Data is Now Normal Data 77 Liberation from the Rigidity of a Single Data Model 78 Streaming Data 78 Suitable Tools for the Task 78 Easier Accessibility 79 Reducing Costs 79 Scalability 79 Data Management and Data Governance for AI 80 Schema-on-Read vs. Schema-on-Write 81 Summary 84 Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87 A Need for Organization 87 The Staging Zone 90 The Raw Zone 91 The Discovery and Exploration Zone 92 The Aligned Zone 93 The Harmonized Zone 98 The Curated Zone 100 Data Topologies 100 Zone Map 103 Data Pipelines 104 Data Topography 105 Expanding, Adding, Moving, and Removing Zones 107 Enabling the Zones 108 Ingestion 108 Data Governance 111 Data Storage and Retention 112 Data Processing 114 Data Access 116 Management and Monitoring 117 Metadata 118 Summary 119 Chapter 6 Addressing Operational Disciplines on the AI Ladder 121 A Passage of Time 122 Create 128 Stability 128 Barriers 129 Complexity 129 Execute 130 Ingestion 131 Visibility 132 Compliance 132 Operate 133 Quality 134 Reliance 135 Reusability 135 The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136 DevOps/MLOps 137 DataOps 139 AIOps 142 Summary 144 Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147 Toward a Value Chain 148 Chaining Through Correlation 152 Enabling Action 154 Expanding the Means to Act 155 Curation 156 Data Governance 159 Integrated Data Management 162 Onboarding 163 Organizing 164 Cataloging 166 Metadata 167 Preparing 168 Provisioning 169 Multi-Tenancy 170 Summary 173 Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175 Deriving Value: Managing Data as an Asset 175 An Inexact Science 180 Accessibility to Data: Not All Users are Equal 183 Providing Self-Service to Data 184 Access: The Importance of Adding Controls 186 Ranking Datasets Using a Bottom-Up Approach for Data Governance 187 How Various Industries Use Data and AI 188 Benefi ting from Statistics 189 Summary 198 Chapter 9 Constructing for the Long-Term 199 The Need to Change Habits: Avoiding Hard-Coding 200 Overloading 201 Locked In 202 Ownership and Decomposition 204 Design to Avoid Change 204 Extending the Value of Data Through AI 206 Polyglot Persistence 208 Benefi ting from Data Literacy 213 Understanding a Topic 215 Skillsets 216 It’s All Metadata 218 The Right Data, in the Right Context, with the Right Interface 219 Summary 221 Chapter 10 A Journey’s End: An IA for AI 223 Development Efforts for AI 224 Essential Elements: Cloud-Based Computing, Data, and Analytics 228 Intersections: Compute Capacity and Storage Capacity 234 Analytic Intensity 237 Interoperability Across the Elements 238 Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242 Data Management for the Data Puddle, Data Pond, and Data Lake 243 Driving Action: Context, Content, and Decision-Makers 245 Keep It Simple 248 The Silo is Dead; Long Live the Silo 250 Taxonomy: Organizing Data Zones 252 Capabilities for an Open Platform 256 Summary 260 Appendix Glossary of Terms 263 Index 269
£30.39
John Wiley & Sons Inc Responsible Data Science
Book SynopsisExplore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of Black box algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk Table of ContentsIntroduction xix Part I Motivation for Ethical Data Science and Background Knowledge 1 Chapter 1 Responsible Data Science 3 The Optum Disaster 4 Jekyll and Hyde 5 Eugenics 7 Galton, Pearson, and Fisher 7 Ties between Eugenics and Statistics 7 Ethical Problems in Data Science Today 9 Predictive Models 10 From Explaining to Predicting 10 Predictive Modeling 11 Setting the Stage for Ethical Issues to Arise 12 Classic Statistical Models 12 Black-Box Methods 14 Important Concepts in Predictive Modeling 19 Feature Selection 19 Model-Centric vs. Data-Centric Models 20 Holdout Sample and Cross-Validation 20 Overfitting 21 Unsupervised Learning 22 The Ethical Challenge of Black Boxes 23 Two Opposing Forces 24 Pressure for More Powerful AI 24 Public Resistance and Anxiety 24 Summary 25 Chapter 2 Background: Modeling and the Black-Box Algorithm 27 Assessing Model Performance 27 Predicting Class Membership 28 The Rare Class Problem 28 Lift and Gains 28 Area Under the Curve 29 AUC vs. Lift (Gains) 31 Predicting Numeric Values 32 Goodness-of-Fit 32 Holdout Sets and Cross-Validation 33 Optimization and Loss Functions 34 Intrinsically Interpretable Models vs. Black-Box Models 35 Ethical Challenges with Interpretable Models 38 Black-Box Models 39 Ensembles 39 Nearest Neighbors 41 Clustering 41 Association Rules 42 Collaborative Filters 42 Artificial Neural Nets and Deep Neural Nets 43 Problems with Black-Box Predictive Models 45 Problems with Unsupervised Algorithms 47 Summary 48 Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49 AI and Intentional Consequences by Design 50 Deepfakes 50 Supporting State Surveillance and Suppression 51 Behavioral Manipulation 52 Automated Testing to Fine-Tune Targeting 53 AI and Unintended Consequences 55 Healthcare 56 Finance 57 Law Enforcement 58 Technology 60 The Legal and Regulatory Landscape around AI 61 Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63 A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64 Trends in Emerging Law and Policy Related to AI 66 Summary 69 Part II The Ethical Data Science Process 71 Chapter 4 The Responsible Data Science Framework 73 Why We Keep Building Harmful AI 74 Misguided Need for Cutting-Edge Models 74 Excessive Focus on Predictive Performance 74 Ease of Access and the Curse of Simplicity 76 The Common Cause 76 The Face Thieves 78 An Anatomy of Modeling Harms 79 The World: Context Matters for Modeling 80 The Data: Representation Is Everything 83 The Model: Garbage In, Danger Out 85 Model Interpretability: Human Understanding for Superhuman Models 86 Efforts Toward a More Responsible Data Science 89 Principles Are the Focus 90 Nonmaleficence 90 Fairness 90 Transparency 91 Accountability 91 Privacy 92 Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92 Justification 94 Compilation 94 Preparation 95 Modeling 96 Auditing 96 Summary 97 Chapter 5 Model Interpretability: The What and the Why 99 The Sexist Résumé Screener 99 The Necessity of Model Interpretability 101 Connections Between Predictive Performance and Interpretability 103 Uniting (High) Model Performance and Model Interpretability 105 Categories of Interpretability Methods 107 Global Methods 107 Local Methods 113 Real-World Successes of Interpretability Methods 113 Facilitating Debugging and Audit 114 Leveraging the Improved Performance of Black-Box Models 116 Acquiring New Knowledge 116 Addressing Critiques of Interpretability Methods 117 Explanations Generated by Interpretability Methods Are Not Robust 118 Explanations Generated by Interpretability Methods Are Low Fidelity 120 The Forking Paths of Model Interpretability 121 The Four-Measure Baseline 122 Building Our Own Credit Scoring Model 124 Using Train-Test Splits 125 Feature Selection and Feature Engineering 125 Baseline Models 127 The Importance of Making Your Code Work for Everyone 129 Execution Variability 129 Addressing Execution Variability with Functionalized Code 130 Stochastic Variability 130 Addressing Stochastic Variability via Resampling 130 Summary 133 Part III EDS in Practice 135 Chapter 6 Beginning a Responsible Data Science Project 137 How the Responsible Data Science Framework Addresses the Common Cause 138 Datasets Used 140 Regression Datasets—Communities and Crime 140 Classification Datasets—COMPAS 140 Common Elements Across Our Analyses 141 Project Structure and Documentation 141 Project Structure for the Responsible Data Science Framework: Everything in Its Place 142 Documentation: The Responsible Thing to Do 145 Beginning a Responsible Data Science Project 151 Communities and Crime (Regression) 151 Justification 151 Compilation 154 Identifying Protected Classes 157 Preparation—Data Splitting and Feature Engineering 159 Datasheets 161 COMPAS (Classification) 164 Justification 164 Compilation 166 Identifying Protected Classes 168 Preparation 169 Summary 172 Chapter 7 Auditing a Responsible Data Science Project 173 Fairness and Data Science in Practice 175 The Many Different Conceptions of Fairness 175 Different Forms of Fairness Are Trade-Offs with Each Other 177 Quantifying Predictive Fairness Within a Data Science Project 179 Mitigating Bias to Improve Fairness 185 Preprocessing 185 In-processing 186 Postprocessing 186 Classification Example: COMPAS 187 Prework: Code Practices, Modeling, and Auditing 187 Justification, Compilation, and Preparation Review 189 Modeling 191 Auditing 200 Per-Group Metrics: Overall 200 Per-Group Metrics: Error 202 Fairness Metrics 204 Interpreting Our Models: Why Are They Unfair? 207 Analysis for Different Groups 209 Bias Mitigation 214 Preprocessing: Oversampling 214 Postprocessing: Optimizing Thresholds Automatically 218 Postprocessing: Optimizing Thresholds Manually 219 Summary 223 Chapter 8 Auditing for Neural Networks 225 Why Neural Networks Merit Their Own Chapter 227 Neural Networks Vary Greatly in Structure 227 Neural Networks Treat Features Differently 229 Neural Networks Repeat Themselves 231 A More Impenetrable Black Box 232 Baseline Methods 233 Representation Methods 233 Distillation Methods 234 Intrinsic Methods 235 Beginning a Responsible Neural Network Project 236 Justification 236 Moving Forward 239 Compilation 239 Tracking Experiments 241 Preparation 244 Modeling 245 Auditing 247 Per-Group Metrics: Overall 247 Per-Group Metrics: Unusual Definitions of “False Positive” 248 Fairness Metrics 249 Interpreting Our Models: Why Are They Unfair? 252 Bias Mitigation 253 Wrap-Up 255 Auditing Neural Networks for Natural Language Processing 258 Identifying and Addressing Sources of Bias in NLP 258 The Real World 259 Data 260 Models 261 Model Interpretability 262 Summary 262 Chapter 9 Conclusion 265 How Can We Do Better? 267 The Responsible Data Science Framework 267 Doing Better As Managers 269 Doing Better As Practitioners 270 A Better Future If We Can Keep It 271 Index 273
£24.79
John Wiley & Sons Inc Machine Learning for Business Analytics
Book SynopsisMachine Learning for Business Analytics Machine learningalso known as data mining or data analyticsis a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information. Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques. This is the seventh edition of Machine Learning for Business Analytics, and the first using RapidMiner software. This edition also includes: ATable of ContentsForeword by Ravi Bapna xxi Preface to the RapidMiner Edition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Machine Learning? 5 1.3 Machine Learning, AI, and Related Terms 5 1.4 Big Data 7 1.5 Data Science 8 1.6 Why Are There So Many Different Methods? 9 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 12 1.9 Using RapidMiner Studio 14 CHAPTER 2 Overview of the Machine Learning Process 19 2.1 Introduction 19 2.2 Core Ideas in Machine Learning 20 2.3 The Steps in a Machine Learning Project 23 2.4 Preliminary Steps 25 2.5 Predictive Power and Overfitting 32 2.6 Building a Predictive Model with RapidMiner 37 2.7 Using RapidMiner for Machine Learning 45 2.8 Automating Machine Learning Solutions 47 2.9 Ethical Practice in Machine Learning 52 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 63 3.1 Introduction 63 3.2 Data Examples 65 3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66 3.4 Multidimensional Visualization 75 3.5 Specialized Visualizations 87 3.6 Summary: Major Visualizations and Operations, by Machine Learning Goal 92 CHAPTER 4 Dimension Reduction 97 4.1 Introduction 97 4.2 Curse of Dimensionality 98 4.3 Practical Considerations 98 4.4 Data Summaries 100 4.5 Correlation Analysis 103 4.6 Reducing the Number of Categories in Categorical Attributes 105 4.7 Converting a Categorical Attribute to a Numerical Attribute 107 4.8 Principal Component Analysis 107 4.9 Dimension Reduction Using Regression Models 117 4.10 Dimension Reduction Using Classification and Regression Trees 119 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 125 5.1 Introduction 125 5.2 Evaluating Predictive Performance 126 5.3 Judging Classifier Performance 131 5.4 Judging Ranking Performance 146 5.5 Oversampling 151 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 163 6.1 Introduction 163 6.2 Explanatory vs. Predictive Modeling 164 6.3 Estimating the Regression Equation and Prediction 166 6.4 Variable Selection in Linear Regression 171 CHAPTER 7 k-Nearest Neighbors (k-NN) 189 7.1 The k-NN Classifier (Categorical Label) 189 7.2 k-NN for a Numerical Label 200 7.3 Advantages and Shortcomings of k-NN Algorithms 202 CHAPTER 8 The Naive Bayes Classifier 209 8.1 Introduction 209 8.2 Applying the Full (Exact) Bayesian Classifier 211 8.3 Solution: Naive Bayes 213 8.4 Advantages and Shortcomings of the Naive Bayes Classifier 223 CHAPTER 9 Classification and Regression Trees 229 9.1 Introduction 229 9.2 Classification Trees 232 9.3 Evaluating the Performance of a Classification Tree 240 9.4 Avoiding Overfitting 245 9.5 Classification Rules from Trees 255 9.6 Classification Trees for More Than Two Classes 256 9.7 Regression Trees 256 9.8 Improving Prediction: Random Forests and Boosted Trees 259 9.9 Advantages and Weaknesses of a Tree 261 CHAPTER 10 Logistic Regression 269 10.1 Introduction 269 10.2 The Logistic Regression Model 271 10.3 Example: Acceptance of Personal Loan 272 10.4 Logistic Regression for Multi-class Classification 283 10.5 Example of Complete Analysis: Predicting Delayed Flights 286 CHAPTER 11 Neural Networks 305 11.1 Introduction 306 11.2 Concept and Structure of a Neural Network 306 11.3 Fitting a Network to Data 307 11.4 Required User Input 321 11.5 Exploring the Relationship Between Predictors and Target Attribute 322 11.6 Deep Learning 323 11.7 Advantages and Weaknesses of Neural Networks 334 CHAPTER 12 Discriminant Analysis 337 12.1 Introduction 337 12.2 Distance of a Record from a Class 340 12.3 Fisher’s Linear Classification Functions 341 12.4 Classification Performance of Discriminant Analysis 346 12.5 Prior Probabilities 348 12.6 Unequal Misclassification Costs 348 12.7 Classifying More Than Two Classes 349 12.8 Advantages and Weaknesses 351 CHAPTER 13 Generating, Comparing, and Combining Multiple Models 359 13.1 Automated Machine Learning (AutoML) 359 13.2 Explaining Model Predictions 367 13.3 Ensembles 373 13.4 Summary 381 PART V INTERVENTION AND USER FEEDBACK CHAPTER 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 387 14.1 A/B Testing 387 14.2 Uplift (Persuasion) Modeling 393 14.3 Reinforcement Learning 400 14.4 Summary 405 PART VI MINING RELATIONSHIPS AMONG RECORDS CHAPTER 15 Association Rules and Collaborative Filtering 409 15.1 Association Rules 409 15.2 Collaborative Filtering 424 15.3 Summary 438 CHAPTER 16 Cluster Analysis 445 16.1 Introduction 445 16.2 Measuring Distance Between Two Records 449 16.3 Measuring Distance Between Two Clusters 455 16.4 Hierarchical (Agglomerative) Clustering 457 16.5 Non-Hierarchical Clustering: The k-Means Algorithm 466 PART VII FORECASTING TIME SERIES CHAPTER 17 Handling Time Series 479 17.1 Introduction 480 17.2 Descriptive vs. Predictive Modeling 481 17.3 Popular Forecasting Methods in Business 481 17.4 Time Series Components 482 17.5 Data Partitioning and Performance Evaluation 486 CHAPTER 18 Regression-Based Forecasting 497 18.1 A Model with Trend 498 18.2 A Model with Seasonality 504 18.3 A Model with Trend and Seasonality 508 18.4 Autocorrelation and ARIMA Models 509 CHAPTER 19 Smoothing and Deep Learning Methods for Forecasting 533 19.1 Smoothing Methods: Introduction 534 19.2 Moving Average 534 19.3 Simple Exponential Smoothing 541 19.4 Advanced Exponential Smoothing 545 19.5 Deep Learning for Forecasting 549 PART VIII DATA ANALYTICS CHAPTER 20 Social Network Analytics 563 20.1 Introduction 563 20.2 Directed vs. Undirected Networks 564 20.3 Visualizing and Analyzing Networks 567 20.4 Social Data Metrics and Taxonomy 571 20.5 Using Network Metrics in Prediction and Classification 577 20.6 Collecting Social Network Data with RapidMiner 584 20.7 Advantages and Disadvantages 584 CHAPTER 21 Text Mining 589 21.1 Introduction 589 21.2 The Tabular Representation of Text: Term–Document Matrix and “Bag-of-Words’’ 590 21.3 Bag-of-Words vs. Meaning Extraction at Document Level 592 21.4 Preprocessing the Text 593 21.5 Implementing Machine Learning Methods 602 21.6 Example: Online Discussions on Autos and Electronics 602 21.7 Example: Sentiment Analysis of Movie Reviews 607 21.8 Summary 614 CHAPTER 22 Responsible Data Science 617 22.1 Introduction 617 22.2 Unintentional Harm 618 22.3 Legal Considerations 620 22.4 Principles of Responsible Data Science 621 22.5 A Responsible Data Science Framework 624 22.6 Documentation Tools 628 22.7 Example: Applying the RDS Framework to the COMPAS Example 631 22.8 Summary 641 PART IX CASES CHAPTER 23 Cases 647 23.1 Charles Book Club 647 23.2 German Credit 653 23.3 Tayko Software Cataloger 658 23.4 Political Persuasion 662 23.5 Taxi Cancellations 665 23.6 Segmenting Consumers of Bath Soap 667 23.7 Direct-Mail Fundraising 670 23.8 Catalog Cross-Selling 672 23.9 Time Series Case: Forecasting Public Transportation Demand 673 23.10 Loan Approval 675 Index 685
£96.30
John Wiley & Sons Inc Operating AI
Book SynopsisA holistic and real-world approach to operationalizing artificial intelligence in your company InOperating AI, Director of Technology and Architecture at Ericsson AB, Ulrika Jägare, delivers an eye-opening new discussion of how to introduce your organization to artificial intelligence by balancing data engineering, model development, and AI operations. You'll learn the importance of embracing an AI operational mindset to successfully operate AI and lead AI initiatives through the entire lifecycle, including keyareas such as; data mesh, data fabric,aspects ofsecurity,data privacy,data rights and IPR related to data and AI models. In the book, you'll also discover: How to reduce the risk of entering bias in our artificial intelligence solutions and how to approach explainable AI (XAI)The importance of efficient and reproduceable data pipelines, including how to manage your company's dataAn operational perspective on the development of AI models using the MLOps (Machine Learning Operations) approach, including how to deploy, run and monitor models and ML pipelines in production using CI/CD/CT techniques, that generates value in the real worldKey competences and toolsets in AI development, deployment and operationsWhat to consider when operating different types of AI business models With a strong emphasis on deployment and operations of trustworthy and reliable AI solutions that operate well in the real worldand not just the labOperating AIis a must-read for business leaders looking for ways to operationalize an AI business model that actually makes money, from the concept phase to running in a live production environment.Table of ContentsForeword xii Introduction xv Chapter 1 Balancing the AI Investment 1 Defining AI and Related Concepts 3 Operational Readiness and Why It Matters 8 Applying an Operational Mind- set from the Start 12 The Operational Challenge 15 Strategy, People, and Technology Considerations 19 Strategic Success Factors in Operating AI 20 People and Mind- sets 23 The Technology Perspective 28 Chapter 2 Data Engineering Focused on AI 31 Know Your Data 32 Know the Data Structure 32 Know the Data Records 34 Know the Business Data Oddities 35 Know the Data Origin 36 Know the Data Collection Scope 37 The Data Pipeline 38 Types of Data Pipeline Solutions 41 Data Quality in Data Pipelines 44 The Data Quality Approach in AI/ML 45 Scaling Data for AI 49 Key Capabilities for Scaling Data 51 Introducing a Data Mesh 53 When You Have No Data 55 The Role of a Data Fabric 56 Why a Data Fabric Matters in AI/ML 58 Key Competences and Skillsets in Data Engineering 60 Chapter 3 Embracing MLOps 71 MLOps as a Concept 72 From ML Models to ML Pipelines 76 The ML Pipeline 78 Adopt a Continuous Learning Approach 84 The Maturity of Your AI/ML Capability 86 Level 0— Model Focus and No MLOps 88 Level 1— Pipelines Rather than Models 89 Level 2— Leveraging Continuous Learning 90 The Model Training Environment 91 Enabling ML Experimentation 92 Using a Simulator for Model Training 94 Environmental Impact of Training AI Models 96 Considering the AI/ML Functional Technology Stack 97 Key Competences and Toolsets in MLOps 103 Clarifying Similarities and Differences 106 MLOps Toolsets 107 Chapter 4 Deployment with AI Operations in Mind 115 Model Serving in Practice 117 Feature Stores 118 Deploying, Serving, and Inferencing Models at Scale 121 The ML Inference Pipeline 123 Model Serving Architecture Components 125 Considerations Regarding Toolsets for Model Serving 129 The Industrialization of AI 129 The Importance of a Cultural Shift 139 Chapter 5 Operating AI Is Different from Operating Software 143 Model Monitoring 144 Ensuring Efficient ML Model Monitoring 145 Model Scoring in Production 146 Retraining in Production Using Continuous Training 151 Data Aspects Related to Model Retraining 155 Understanding Different Retraining Techniques 156 Deployment after Retraining 159 Disadvantages of Retraining Models Frequently 159 Diagnosing and Managing Model Performance Issues in Operations 161 Issues with Data Processing 162 Issues with Data Schema Change 163 Data Loss at the Source 165 Models Are Broken Upstream 166 Monitoring Data Quality and Integrity 167 Monitoring the Model Calls 167 Monitoring the Data Schema 168 Detecting Any Missing Data 168 Validating the Feature Values 169 Monitor the Feature Processing 170 Model Monitoring for Stakeholders 171 Ensuring Stakeholder Collaboration for Model Success 173 Toolsets for Model Monitoring in Production 175 Chapter 6 AI Is All About Trust 181 Anonymizing Data 182 Data Anonymization Techniques 185 Pros and Cons of Data Anonymization 187 Explainable AI 189 Complex AI Models Are Harder to Understand 190 What Is Interpretability? 191 The Need for Interpretability in Different Phases 192 Reducing Bias in Practice 194 Rights to the Data and AI Models 199 Data Ownership 200 Who Owns What in a Trained AI Model? 202 Balancing the IP Approach for AI Models 205 The Role of AI Model Training 206 Addressing IP Ownership in AI Results 207 Legal Aspects of AI Techniques 208 Operational Governance of Data and AI 210 Chapter 7 Achieving Business Value from AI 215 The Challenge of Leveraging Value from AI 216 Productivity 216 Reliability 217 Risk 218 People 219 Top Management and AI Business Realization 219 Measuring AI Business Value 223 Measuring AI Value in Nonrevenue Terms 227 Operating Different AI Business Models 229 Operating Artificial Intelligence as a Service 230 Operating Embedded AI Solutions 236 Operating a Hybrid AI Business Model 239 Index 241
£24.79
John Wiley & Sons Inc Fuzzy Computing in Data Science
Book SynopsisFUZZY COMPUTING IN DATA SCIENCE This book comprehensively explains how to use various fuzzy-based models to solve real-time industrial challenges. The book provides information about fundamental aspects of the field and explores the myriad applications of fuzzy logic techniques and methods. It presents basic conceptual considerations and case studies of applications of fuzzy computation. It covers the fundamental concepts and techniques for system modeling, information processing, intelligent system design, decision analysis, statistical analysis, pattern recognition, automated learning, system control, and identification. The book also discusses the combination of fuzzy computation techniques with other computational intelligence approaches such as neural and evolutionary computation. Audience Researchers and students in computer science, artificial intelligence, machine learning, big data analytics, and information and communication technology.Table of ContentsPreface xvii Acknowledgement xxi 1 Band Reduction of HSI Segmentation Using FCM 1 V. Saravana Kumar, S. Anantha Sivaprakasam, E.R. Naganathan, Sunil Bhutada and M. Kavitha 1.1 Introduction 2 1.2 Existing Method 3 1.2.1 K-Means Clustering Method 3 1.2.2 Fuzzy C-Means 3 1.2.3 Davies Bouldin Index 4 1.2.4 Data Set Description of HSI 4 1.3 Proposed Method 5 1.3.1 Hyperspectral Image Segmentation Using Enhanced Estimation of Centroid 5 1.3.2 Band Reduction Using K-Means Algorithm 6 1.3.3 Band Reduction Using Fuzzy C-Means 7 1.4 Experimental Results 8 1.4.1 DB Index Graph 8 1.4.2 K-Means–Based PSC (EEOC) 9 1.4.3 Fuzzy C-Means–Based PSC (EEOC) 10 1.5 Analysis of Results 12 1.6 Conclusions 16 References 17 2 A Fuzzy Approach to Face Mask Detection 21 Vatsal Mishra, Tavish Awasthi, Subham Kashyap, Minerva Brahma, Monideepa Roy and Sujoy Datta 2.1 Introduction 22 2.2 Existing Work 23 2.3 The Proposed Framework 26 2.4 Set-Up and Libraries Used 26 2.5 Implementation 27 2.6 Results and Analysis 29 2.7 Conclusion and Future Work 33 References 34 3 Application of Fuzzy Logic to the Healthcare Industry 37 Biswajeet Sahu, Lokanath Sarangi, Abhinadita Ghosh and Hemanta Kumar Palo 3.1 Introduction 38 3.2 Background 41 3.3 Fuzzy Logic 42 3.4 Fuzzy Logic in Healthcare 45 3.5 Conclusions 49 References 50 4 A Bibliometric Approach and Systematic Exploration of Global Research Activity on Fuzzy Logic in Scopus Database 55 Sugyanta Priyadarshini and Nisrutha Dulla 4.1 Introduction 56 4.2 Data Extraction and Interpretation 58 4.3 Results and Discussion 59 4.3.1 Per Year Publication and Citation Count 59 4.3.2 Prominent Affiliations Contributing Toward Fuzzy Logic 60 4.3.3 Top Journals Emerging in Fuzzy Logic in Major Subject Areas 61 4.3.4 Major Contributing Countries Toward Fuzzy Research Articles 63 4.3.5 Prominent Authors Contribution Toward the Fuzzy Logic Analysis 66 4.3.6 Coauthorship of Authors 67 4.3.7 Cocitation Analysis of Cited Authors 68 4.3.8 Cooccurrence of Author Keywords 68 4.4 Bibliographic Coupling of Documents, Sources, Authors, and Countries 70 4.4.1 Bibliographic Coupling of Documents 70 4.4.2 Bibliographic Coupling of Sources 71 4.4.3 Bibliographic Coupling of Authors 72 4.4.4 Bibliographic Coupling of Countries 73 4.5 Conclusion 74 References 76 5 Fuzzy Decision Making in Predictive Analytics and Resource Scheduling 79 Rekha A. Kulkarni, Suhas H. Patil and Bithika Bishesh 5.1 Introduction 80 5.2 History of Fuzzy Logic and Its Applications 81 5.3 Approximate Reasoning 82 5.4 Fuzzy Sets vs Classical Sets 83 5.5 Fuzzy Inference System 84 5.5.1 Characteristics of FIS 85 5.5.2 Working of FIS 85 5.5.3 Methods of FIS 86 5.6 Fuzzy Decision Trees 86 5.6.1 Characteristics of Decision Trees 87 5.6.2 Construction of Fuzzy Decision Trees 87 5.7 Fuzzy Logic as Applied to Resource Scheduling in a Cloud Environment 88 5.8 Conclusion 90 References 91 6 Application of Fuzzy Logic and Machine Learning Concept in Sales Data Forecasting Decision Analytics Using ARIMA Model 93 S. Mala and V. Umadevi 6.1 Introduction 94 6.1.1 Aim and Scope 94 6.1.2 R-Tool 94 6.1.3 Application of Fuzzy Logic 94 6.1.4 Dataset 95 6.2 Model Study 96 6.2.1 Introduction to Machine Learning Method 96 6.2.2 Time Series Analysis 96 6.2.3 Components of a Time Series 97 6.2.4 Concepts of Stationary 99 6.2.5 Model Parsimony 100 6.3 Methodology 100 6.3.1 Exploratory Data Analysis 100 6.3.1.1 Seed Types—Analysis 101 6.3.1.2 Comparison of Location and Seeds 101 6.3.1.3 Comparison of Season (Month) and Seeds 103 6.3.2 Forecasting 103 6.3.2.1 Auto Regressive Integrated Moving Average (ARIMA) 103 6.3.2.2 Data Visualization 106 6.3.2.3 Implementation Model 108 6.4 Result Analysis 108 6.5 Conclusion 110 References 110 7 Modified m-Polar Fuzzy Set ELECTRE-I Approach 113 Madan Jagtap, Prasad Karande and Pravin Patil 7.1 Introduction 114 7.1.1 Objectives 114 7.2 Implementation of m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculations 115 7.2.1 The m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculation Method 115 7.3 Application to Industrial Problems 118 7.3.1 Cutting Fluid Selection Problem 118 7.3.2 Results Obtained From m-Polar Fuzzy ELECTRE-I for Cutting Fluid Selection Problem 122 7.3.3 FMS Selection Problem 125 7.3.4 Results Obtained From m-Polar Fuzzy ELECTRE-I for FMS Selection 130 7.4 Conclusions 143 References 143 8 Fuzzy Decision Making: Concept and Models 147 Bithika Bishesh 8.1 Introduction 148 8.2 Classical Set 149 8.3 Fuzzy Set 150 8.4 Properties of Fuzzy Set 151 8.5 Types of Decision Making 153 8.5.1 Individual Decision Making 153 8.5.2 Multiperson Decision Making 157 8.5.3 Multistage Decision Making 158 8.5.4 Multicriteria Decision Making 160 8.6 Methods of Multiattribute Decision Making (MADM) 162 8.6.1 Weighted Sum Method (WSM) 162 8.6.2 Weighted Product Method (WPM) 162 8.6.3 Weighted Aggregates Sum Product Assessment (WASPAS) 163 8.6.4 Technique for Order Preference by Similarity to Ideal Solutions (TOPSIS) 166 8.7 Applications of Fuzzy Logic 167 8.8 Conclusion 169 References 169 9 Use of Fuzzy Logic for Psychological Support to Migrant Workers of Southern Odisha (India) 173 Sanjaya Kumar Sahoo and Sukanta Chandra Swain 9.1 Introduction 174 9.2 Objectives and Methodology 175 9.2.1 Objectives 175 9.2.2 Methodology 176 9.3 Effect of COVID-19 on the Psychology and Emotion of Repatriated Migrants 176 9.3.1 Psychological Variables Identified 176 9.3.2 Fuzzy Logic for Solace to Migrants 176 9.4 Findings 178 9.5 Way Out for Strengthening the Psychological Strength of the Migrant Workers through Technological Aid 178 9.6 Conclusion 179 References 180 10 Fuzzy-Based Edge AI Approach: Smart Transformation of Healthcare for a Better Tomorrow 181 B. RaviKrishna, Sirisha Potluri, J. Rethna Virgil Jeny, Guna Sekhar Sajja and Katta Subba Rao 10.1 Significance of Machine Learning in Healthcare 182 10.2 Cloud-Based Artificial Intelligent Secure Models 183 10.3 Applications and Usage of Machine Learning in Healthcare 183 10.3.1 Detecting Diseases and Diagnosis 183 10.3.2 Drug Detection and Manufacturing 183 10.3.3 Medical Imaging Analysis and Diagnosis 184 10.3.4 Personalized/Adapted Medicine 185 10.3.5 Behavioral Modification 185 10.3.6 Maintenance of Smart Health Data 185 10.3.7 Clinical Trial and Study 185 10.3.8 Crowdsourced Information Discovery 185 10.3.9 Enhanced Radiotherapy 186 10.3.10 Outbreak/Epidemic Prediction 186 10.4 Edge AI: For Smart Transformation of Healthcare 186 10.4.1 Role of Edge in Reshaping Healthcare 186 10.4.2 How AI Powers the Edge 187 10.5 Edge AI-Modernizing Human Machine Interface 188 10.5.1 Rural Medicine 188 10.5.2 Autonomous Monitoring of Hospital Rooms—A Case Study 188 10.6 Significance of Fuzzy in Healthcare 189 10.6.1 Fuzzy Logic—Outline 189 10.6.2 Fuzzy Logic-Based Smart Healthcare 190 10.6.3 Medical Diagnosis Using Fuzzy Logic for Decision Support Systems 191 10.6.4 Applications of Fuzzy Logic in Healthcare 193 10.7 Conclusion and Discussions 193 References 194 11 Video Conferencing (VC) Software Selection Using Fuzzy TOPSIS 197 Rekha Gupta 11.1 Introduction 197 11.2 Video Conferencing Software and Its Major Features 199 11.2.1 Video Conferencing/Meeting Software (VC/MS) for Higher Education Institutes 199 11.3 Fuzzy TOPSIS 203 11.3.1 Extension of TOPSIS Algorithm: Fuzzy TOPSIS 203 11.4 Sample Numerical Illustration 207 11.5 Conclusions 213 References 213 12 Estimation of Nonperforming Assets of Indian Commercial Banks Using Fuzzy AHP and Goal Programming 215 Kandarp Vidyasagar and Rajiv Kr. Dwivedi 12.1 Introduction 216 12.1.1 Basic Concepts of Fuzzy AHP and Goal Programming 217 12.2 Research Model 221 12.2.1 Average Growth Rate Calculation 227 12.3 Result and Discussion 233 12.4 Conclusion 234 References 234 13 Evaluation of Ergonomic Design for the Visual Display Terminal Operator at Static Work Under FMCDM Environment 237 Bipradas Bairagi 13.1 Introduction 238 13.2 Proposed Algorithm 240 13.3 An Illustrative Example on Ergonomic Design Evaluation 245 13.4 Conclusions 249 References 249 14 Optimization of Energy Generated from Ocean Wave Energy Using Fuzzy Logic 253 S. B. Goyal, Pradeep Bedi, Jugnesh Kumar and Prasenjit Chatterjee 14.1 Introduction 254 14.2 Control Approach in Wave Energy Systems 255 14.3 Related Work 257 14.4 Mathematical Modeling for Energy Conversion from Ocean Waves 259 14.5 Proposed Methodology 260 14.5.1 Wave Parameters 261 14.5.2 Fuzzy-Optimizer 262 14.6 Conclusion 264 References 264 15 The m-Polar Fuzzy TOPSIS Method for NTM Selection 267 Madan Jagtap and Prasad Karande 15.1 Introduction 268 15.2 Literature Review 268 15.3 Methodology 270 15.3.1 Steps of the mFS TOPSIS 270 15.4 Case Study 272 15.4.1 Effect of Analytical Hierarchy Process (AHP) Weight Calculation on the mFS TOPSIS Method 273 15.4.2 Effect of Shannon’s Entropy Weight Calculation on the m-Polar Fuzzy Set TOPSIS Method 277 15.5 Results and Discussions 281 15.5.1 Result Validation 281 15.6 Conclusions and Future Scope 283 References 284 16 Comparative Analysis on Material Handling Device Selection Using Hybrid FMCDM Methodology 287 Bipradas Bairagi 16.1 Introduction 288 16.2 MCDM Techniques 289 16.2.1 Fahp 289 16.2.2 Entropy Method as Weights (Influence) Evaluation Technique 290 16.3 The Proposed Hybrid and Super Hybrid FMCDM Approaches 291 16.3.1 Topsis 291 16.3.2 FMOORA Method 292 16.3.3 FVIKOR 292 16.3.4 Fuzzy Grey Theory (FGT) 293 16.3.5 COPRAS –G 293 16.3.6 Super Hybrid Algorithm 294 16.4 Illustrative Example 295 16.5 Results and Discussions 298 16.5.1 FTOPSIS 298 16.5.2 FMOORA 298 16.5.3 FVIKRA 298 16.5.4 Fuzzy Grey Theory (FGT) 299 16.5.5 COPRAS-G 299 16.5.6 Super Hybrid Approach (SHA) 299 16.6 Conclusions 302 References 302 17 Fuzzy MCDM on CCPM for Decision Making: A Case Study 305 Bimal K. Jena, Biswajit Das, Amarendra Baral and Sushanta Tripathy 17.1 Introduction 306 17.2 Literature Review 307 17.3 Objective of Research 308 17.4 Cluster Analysis 308 17.4.1 Hierarchical Clustering 309 17.4.2 Partitional Clustering 309 17.5 Clustering 310 17.6 Methodology 314 17.7 TOPSIS Method 316 17.8 Fuzzy TOPSIS Method 318 17.9 Conclusion 325 17.10 Scope of Future Study 326 References 326 Index 329
£133.20
Johns Hopkins University Press Big Data on Campus
Book SynopsisHow data-informed decision making can make colleges and universities more effective institutions. The continuing importance of data analytics is not lost on higher education leaders, who face a multitude of challenges, including increasing operating costs, dwindling state support, limits to tuition increases, and increased competition from the for-profit sector. To navigate these challenges, savvy leaders must leverage data to make sound decisions. In Big Data on Campus, leading data analytics experts and higher ed leaders show the role that analytics can play in the better administration of colleges and universities. Aimed at senior administrative leaders, practitioners of institutional research, technology professionals, and graduate students in higher education, the book opens with a conceptual discussion of the roles that data analytics can play in higher education administration. Subsequent chapters address recent developments in technology, the rapid accumulation of data assetsTable of ContentsForeword, by Christine M. KellerAcknowledgments Part I. Technology, Digitization, Big Data, and Analytics Maturity as the Enabling Conditions for Data-Informed Decision MakingChapter 1. Data Analytics and the Imperatives for Data-Informed Decision Making in Higher Education Karen L. Webber and Henry Y. ZhengChapter 2. Big Data and the Transformation of Decision Making in Higher Education Braden J. HoschChapter 3. Predictive Analytics and Its Uses in Higher Education Henry Y. Zheng and Ying ZhouPart II. The Ethical, Cultural, and Managerial Imperatives of Data-Informed Decision Making in Higher EducationChapter 4. Limitations in Data Analytics: Potential Misuse and Misunderstanding in Data Reports and Visualizations Karen L. Webber and Jillian N. MornChapter 5. Guiding Your Organization's Data Strategy: The Roles of University Senior Leaders and Trustees in Strategic Analytics Gail B. Marsh and Rachit TharianiChapter 6. Data Governance, Data Stewardship, and the Building of an AnalyticsOrganizational Culture Rana Glasgal and Valentina NestorPart III. The Application of Analytics in Higher Education Decision Making: Case StudiesChapter 7. Data Analytics and Decision Making in Admissions and Enrollment Management Tom Gutman and Brian P. HinoteChapter 8. Predictive Analytics, Academic Advising, Early Alerts, and Student Success Timothy M. RenickChapter 9. Constituent Relationship Management and Student Engagement Lifecycle Cathy A. O'Bryan, Chris Tompkins, and Carrie Hancock MarcinkevageChapter 10. Learning Analytics for Learning Assessment: Complexities in Efficacy, Implementation, and Broad Use Carrie Klein, Jaime Lester, Huzefa Rangwala, and Aditya JohriChapter 11. Using Data Analytics to Support Institutional Financial and Operational Efficiency Lindsay K. Wayt, Susan M. Menditto, J. Michael Gower, and Charles TegenPart IV. Concluding CommentsChapter 12. Data-Informed Decision Making and the Pursuit of Analytics Maturity in Higher Education Karen L. Webber and Henry Y. ZhengContributorsIndex
£33.25
Johns Hopkins University Press How Colleges Use Data
Book SynopsisWhat does a culture of evidence really look like in higher education?The use of big data and the rapid acceleration of storage and analytics tools have led to a revolution of data use in higher education. Institutions have moved from relying largely on historical trends and descriptive data to the more widespread adoption of predictive and prescriptive analytics. Despite this rapid evolution of data technology and analytics tools, universities and colleges still face a number of obstacles in their data use. In How Colleges Use Data, Jonathan S. Gagliardi presents college and university leaders with an important resource to help cultivate, implement, and sustain a culture of evidence through the ethical and responsible use and adoption of data and analytics. Gagliardi provides a broad context for data use among colleges, including key concepts and use cases related to data and analytics. He also addresses the different dimensions of data use and highlights the promise and perils of the Table of ContentsPrefaceAcknowledgmentsChapter 1. The Evidence ImperativeChapter 2. Demystifying Data and AnalyticsChapter 3. Defining an Institutional Aspiration Using DataChapter 4. Equity and Student SuccessChapter 5. Strategic Finance and Resource OptimizationChapter 6. Academic Quality and RenewalChapter 7. Creating a Data Governance SystemChapter 8. The Promise and Peril of Data and AnalyticsChapter 9. Implementation and PlanningChapter 10. Looking AheadNotesIndex
£21.60
Johns Hopkins University Press Because Data Cant Speak for Itself
Book Synopsis
£18.05
O'Reilly Media 21 Recipes for Mining Twitter
Book SynopsisMillions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools.
£19.19
O'Reilly Media Spring Data The Definitive Guide
Book SynopsisRelational database technologies continue to be predominant in Java enterprise applications, but with newer technologies such as NoSQL databases and Hadoop available, RDBMS is no longer considered a "one size fits all" solution. This book shows you how to increase your options with Spring's data access framework.
£25.59
O'Reilly Media Just Hibernate
Book SynopsisIf you're looking for a short, sweet, and simple introduction (or reintroduction) to Hibernate, this is the book you want. Through clear real-world examples, you'll learn Hibernate and object-relational mapping from the ground up, starting with the basics. Then you'll dive into the framework's moving parts to understand how they work in action.
£19.19
O'Reilly Media Data Mining
Book SynopsisThis non-technical guide shows you how to extract significant business value from big data with Ask-Measure-Learn, a system that helps you ask the right questions, measure the right data, and then learn from the results.
£19.19
O'Reilly Media eXist
Book SynopsisWith this hands-on guide, you'll learn eXist from the ground up, from using this feature-rich database to work with millions of documents to building complex web applications that take advantage of eXist's many extensions.
£28.79
O'Reilly Media HBase
Book SynopsisIf your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs.
£25.59
Springer-Verlag New York Inc. Encyclopedia of Database Systems
Book Synopsis.NET Remoting.- Absolute Time.- Abstract Versus Concrete Temporal Query Languages.- Abstraction.- Access Control.- Access Control Administration Policies.- Access Control Policy Languages.- Access Path.- ACID Properties.- Active and Real-Time Data Warehousing.- Active Database Coupling Modes.- Active Database Execution Model.- Active Database Knowledge Model.- Active Database Management System Architecture.- Active Database Rulebase.- Active Database, Active Database (Management) System.- Active Storage.- Active XML.- Activity.- Activity Diagrams.- Actors/Agents/Roles.- Adaptive Interfaces.- Adaptive Middleware for Message Queuing Systems.- Adaptive Query Processing.- Adaptive Stream Processing.- ADBMS.- Administration Model for RBAC.- Administration Wizards.- Advanced Information Retrieval Measures.- Aggregation: Expressiveness and Containment.- Aggregation-Based Structured Text Retrieval.- Air Indexes for Spatial Databases.- AJAX.- Allen's Relations.- AMOSQL.- AMS Sketch.- Anchor TexTable of Contents.NET Remoting.- Absolute Time.- Abstract Versus Concrete Temporal Query Languages.- Abstraction.- Access Control.- Access Control Administration Policies.- Access Control Policy Languages.- Access Path.- ACID Properties.- Active and Real-Time Data Warehousing.- Active Database Coupling Modes.- Active Database Execution Model.- Active Database Knowledge Model.- Active Database Management System Architecture.- Active Database Rulebase.- Active Database, Active Database (Management) System.- Active Storage.- Active XML.- Activity.- Activity Diagrams.- Actors/Agents/Roles.- Adaptive Interfaces.- Adaptive Middleware for Message Queuing Systems.- Adaptive Query Processing.- Adaptive Stream Processing.- ADBMS.- Administration Model for RBAC.- Administration Wizards.- Advanced Information Retrieval Measures.- Aggregation: Expressiveness and Containment.- Aggregation-Based Structured Text Retrieval.- Air Indexes for Spatial Databases.- AJAX.- Allen's Relations.- AMOSQL.- AMS Sketch.- Anchor Text.- Annotation.- Annotation-based Image Retrieval.- Anomaly Detection on Streams.- Anonymity.- ANSI/INCITS RBAC Standard.- Answering Queries Using Views.- Anti-monotone Constraints.- Applicability Period.- Application Benchmark.- Application Recovery.- Application Server.- Application-Level Tuning.- Applications of Emerging Patterns for Microarray Gene Expression Data Analysis.- Applications of Sensor Network Data Management.- Approximate Queries in Peer-to-Peer Systems.- Approximate Query Processing.- Approximate Reasoning.- Approximation of Frequent Itemsets.- Apriori Property and Breadth-First Search Algorithms.- Architecture-Conscious Database System.- Archiving Experimental Data.- Armstrong Axioms.- Array Databases.- Array Databases_old.- Association Rule Mining on Streams.- Association Rules.- Asymmetric Encryption.- Atelic Data.- Atomic Event.- Atomicity.- Audio.- Audio Classification.- Audio Content Analysis.- Audio Metadata.- Audio Representation.- Audio Segmentation.- Auditing and Forensic Analysis.- Authentication.- Automatic Image Annotation.- Autonomous Replication.- Average Precision.- Average Precision at n.- Average Precision Histogram.- Average R-Precision.- B+-Tree.- Backup and Restore.- Bag Semantics.- Bagging.- Bayesian Classification.- Benchmark Frameworks.- Benchmarks for Big Data Analytics.- Big Data Platforms for Data Analytics.- Big Stream Systems.- Biological Metadata Management.- Biological Networks.- Biological Resource Discovery.- Biological Sequences.- Biomedical Data/Content Acquisition, Curation.- Biomedical Image Data Types and Processing.- Biomedical Scientific Textual Data Types and Processing.- Biostatistics and Data Analysis.- Bi-Temporal Indexing.- Bitemporal Interval.- Bitemporal Relation.- Bitmap Index.- Bitmap-based Index Structures.- Blind Signatures.- Bloom Filters.- BM25.- Boolean Model.- Boosting.- Bootstrap.- Boyce-Codd Normal Form.- BP-Completeness.- Bpref.- Browsing.- Browsing in Digital Libraries.- B-Tree Locking.- Buffer Management.- Buffer Manager.- Buffer Pool.- Business Intelligence.- Business Process Execution Language.- Business Process Management.- Business Process Modeling Notation.- Business Process Reengineering.- Cache-Conscious Query Processing.- Calendar.- Calendric System.- CAP Theorem.- Cardinal Direction Relationships.- Cartesian Product.- Cataloging in Digital Libraries.- Causal Consistency.- Certain (and Possible) Answers.- Change Detection on Streams.- Channel-Based Publish/Subscribe.- Chart.- Chase.- Checksum and Cyclic Redundancy Check Mechanism.- Choreography.- Chronon.- Citation.- Classification.- Classification by Association Rule Analysis.- Classification in Streams.- Client-Server Architecture.- Clinical Data Acquisition, Storage and Management.- Clinical Data and Information Models.- Clinical Data Quality and Validation.- Clinical Decision Support.- Clinical Document Architecture.- Clinical Event.- Clinical Knowledge Repository.- Clinical Observation.- Clinical Ontologies.- Clinical Order.- Closed Itemset Mining and Non-redundant Association Rule Mining.- Closest-Pair Query.- Cloud Computing.- Cloud Intelligence.- Cluster and Distance Measure.- Clustering for Post Hoc Information Retrieval.- Clustering on Streams.- Clustering Overview and Applications.- Clustering Validity.- Clustering with Constraints.- Collaborative Filtering.- Column Segmentation.- Column Stores.- Common Warehouse Metamodel.- Comparative Visualization.- Compensating Transactions.- Complex Event.- Complex Event Processing.- Composed Services and WS-BPEL.- Composite Event.- Composition.- Comprehensions.- Compression of Mobile Location Data.- Computational Media Aesthetics.- Computationally Complete Relational Query Languages.- Computerized Physician Order Entry.- Conceptual Modeling Foundations.- Conceptual Schema Design.- Concurrency Control - Traditional Approaches.- Concurrency Control for Replicated Databases.- Concurrency Control Manager.- Conditional Tables.- Conjunctive Query.- Connection.- Consistency Models For Replicated Data.- Consistent Query Answering.- Constraint Databases.- Constraint Query Languages.- Constraint-Driven Database Repair.- Content-and-Structure Query.- Content-Based Publish/Subscribe.- Content-Based Video Retrieval.- Content-Only Query.- Context.- Contextualization in Structured Text Retrieval.- Continuous Data Protection.- Continuous Monitoring of Spatial Queries.- Continuous Multimedia Data Retrieval.- Continuous Queries in Sensor Networks.- Continuous Query.- ConTract.- Control Data.- Convertible Constraints.- Coordination.- Copyright Issues in Databases.- CORBA.- Correctness Criteria Beyond Serializability.- Cost and quality trade-offs in crowdsourcing.- Cost Estimation.- Count-Min Sketch.- Coupling and De-coupling.- Covering Index.- Crash Recovery.- Cross-Language Mining and Retrieval.- Cross-Modal Multimedia Information Retrieval.- Cross-Validation.- Crowd Database Operators.- Crowd Database Systems.- Crowd Mining and Analysis.- Crowdsourcing Geographic Information Systems.- Cube.- Cube Implementations.- Current Semantics.- Curse of Dimensionality.- Daplex.- Data Acquisition and Dissemination in Sensor Networks.- Data Aggregation in Sensor Networks.- Data Broadcasting, Caching and Replication in Mobile Computing.- Data Cleaning.- Data Compression in Sensor Networks.- Data Conflicts.- Data Definition.- Data Definition Language (DDL).- Data Dictionary.- Data Encryption.- Data Estimation in Sensor Networks.- Data Exchange.- Data Fusion.- Data Fusion in Sensor Networks.- Data Generation.- Data Governance.- Data Integration Architectures and Methodology for the Life Sciences.- Data Integration in Web Data Extraction System.- Data Management for VANETs.- Data Management Fundamentals: Database Management System.- Data Management in Data Centers.- Data Manipulation.- Data Manipulation Language (DML).- Data Mart.- Data Migration Management.- Data Mining.- Data Partitioning.- Data Privacy and Patient Consent.- Data Profiling.- Data Provenance.- Data Quality Assessment.- Data Quality Dimensions.- Data Quality Models.- Data Rank/Swapping.- Data Reduction.- Data Replication.- Data Sampling.- Data Scrubbing.- Data Sketch/Synopsis.- Data Skew.- Data Storage and Indexing in Sensor Networks.- Data Stream.- Data Stream Management Architectures and Prototypes.- Data Types in Scientific Data Management.- Data Uncertainty Management in Sensor Networks.- Data Visualization.- Data Warehouse.- Data Warehouse Life-Cycle and Design.- Data Warehouse Maintenance, Evolution and Versioning.- Data Warehouse Metadata.- Data Warehouse Security.- Data Warehousing for Clinical Research.- Data Warehousing in Cloud Environments.- Data Warehousing on Non-Conventional Data.- Data Warehousing Systems: Foundations and Architectures.- Data, Text, and Web Mining in Healthcare.- Database.- Database Adapter and Connector.- Database Administrator (DBA).- Database Appliances.- Database Benchmarks.- Database Clustering Methods.- Database Clusters.- Database Dependencies.- Database Design.- Database Languages for Sensor Networks.- Database Machine.- Database Management System.- Database Middleware.- Database Repair.- Database Reverse Engineering.- Database Schema.- Database Security.- Database System.- Database Techniques to Improve Scientific Simulations.- Database Trigger.- Database Tuning using Combinatorial Search.- Database Tuning using Online Algorithms.- Database Tuning using Trade-off Elimination.- Database Use in Science Applications.- Datalog.- DBMS Component.- DBMS Interface.- DCE.- DCOM.- Decay Models.- Decision Rule Mining in Rough Set Theory.- Decision Tree Classification.- Decision Trees.- Declarative Networking.- Deductive Data Mining using Granular Computing.- Deduplication.- Deduplication in Data Cleaning.- Deep Instantiation.- Deep-Web Search.- Dense Index.- Dense Pixel Displays.- Density-based Clustering.- Description Logics.- Design for Data Quality.- Dewey Decimal System.- Diagram.- Difference.- Differential Privacy.- Digital Archives and Preservation.- Digital Curation.- Digital Elevation Models.- Digital Libraries.- Digital Rights Management.- Digital Signatures.- Dimension.- Dimension Reduction Techniques for Clustering.- Dimensionality Reduction.- Dimensionality Reduction Techniques For Nearest Neighbor Computations.- Dimension-Extended Topological Relationships.- Direct Attached Storage.- Direct Manipulation.- Disaster Recovery.- Disclosure Risk.- Discounted Cumulated Gain.- Discovery.- Discrete Wavelet Transform and Wavelet Synopses.- Discretionary Access Control.- Disk.- Disk Power Saving.- Distortion Techniques.- Distributed Architecture.- Distributed Concurrency Control.- Distributed Data Streams.- Distributed Database Design.- Distributed Database Systems.- Distributed DBMS.- Distributed Deadlock Management.- Distributed File Systems.- Distributed Hash Table.- Distributed Join.- Distributed Machine Learning.- Distributed Query Optimization.- Distributed Query Processing.- Distributed Recovery.- Distributed Spatial Databases.- Distributed Transaction Management.- Divergence from Randomness Models.- D-measure.- Document.- Document Clustering.- Document Databases.- Document Field.- Document Length Normalization.- Document Links and Hyperlinks.- Document Representations (Inclusive Native and Relational).- Dublin Core.- Dynamic Graphics.- Dynamic Web Pages.- eAccessibility.- ECA Rule Action.- ECA Rule Condition.- ECA Rules.- e-Commerce Transactions.- Effectiveness Involving Multiple Queries.- Ehrenfeucht-Fraïssé Games.- Elasticity.- Electronic Dictionary.- Electronic Encyclopedia.- Electronic Health Record.- Electronic Ink Indexing.- Electronic Newspapers.- Eleven Point Precision-recall Curve.- Emergent Semantics.- Emerging Pattern Based Classification.- Emerging Patterns.- Energy Efficiency in Data Centers.- Ensemble.- Enterprise Application Integration.- Enterprise Content Management.- Enterprise Service Bus.- Enterprise Terminology Services.- Entity Relationship Model.- Entity Resolution.- Entity Retrieval.- Equality-Generating Dependencies.- ERR- Expected Reciprocal Rank.- ERR-IA Intent-aware ERR.- Escrow Transactions.- European Law in Databases.- Evaluation Metrics for Structured Text Retrieval.- Evaluation of Relational Operators.- Event.- Event and Pattern Detection over Streams.- Event Causality.- Event Channel.- Event Cloud.- Event Detection.- Event Driven Architecture.- Event Flow.- Event in Active Databases.- Event in Temporal Databases.- Event Lineage.- Event Pattern Detection.- Event Prediction.- Event Processing Agent.- Event Processing Network.- Event Sink.- Event Source.- Event Specification.- Event Stream.- Event Transformation.- Event-Driven Business Process Management.- Eventual Consistency.- Evidence Based Medicine.- Executable Knowledge.- Execution Skew.- Explicit Event.- Exploratory Data Analysis.- Expressive Power of Query Languages.- Extended Entity-Relationship Model.- Extended Transaction Models and the ACTA Framework.- Extendible Hashing.- Extraction, Transformation, and Loading.- Faceted Search.- Fault-Tolerance and High Availability in Data Stream Management Systems.- Feature Extraction for Content-Based Image Retrieval.- Feature Selection for Clustering.- Feature-Based 3D Object Retrieval.- Field-Based Information Retrieval Models.- Field-Based Spatial Modeling.- First-Order Logic: Semantics.- First-Order Logic: Syntax.- Fixed Time Span.- Flex Transactions.- FM Synopsis.- F-Measure.- Focused Web Crawling.- FOL Modeling of Integrity Constraints (Dependencies).- Forever.- Form.- Fourth Normal Form.- FQL.- Fractal.- Frequency Moments.- Frequent Graph Patterns.- Frequent Items on Streams.- Frequent Itemset Mining with Constraints.- Frequent Itemsets and Association Rules.- Frequent Partial Orders.- Fully-Automatic Web Data Extraction.- Functional Data Model.- Functional Dependencies for Semi-Structured Data.- Functional Dependency.- Functional Query Language.- Fuzzy Models.- Fuzzy Relation.- Fuzzy Set.- Fuzzy Set Approach.- Fuzzy/Linguistic IF-THEN Rules and Linguistic Descriptions.- Gazetteers.- Gene Expression Arrays.- Generalization of ACID Properties.- Generalized Search Tree.- Genetic Algorithms.- Geographic Information System.- Geographical Information Retrieval.- Geography Markup Language.- Geometric Stream Mining.- GEO-RBAC Model.- Georeferencing.- Geosocial Networks.- Geospatial Metadata.- Geo-Targeted Web Search.- GMAP.- Grammar Inference.- Graph.- Graph Data Management in Scientific Applications.- Graph Database.- Graph Management in the Life Sciences.- Graph Mining.- Graph Mining on Streams.- Graph OLAP.- Graphical Models for Uncertain Data Management.- Grid and Workflows.- Grid File (and Family).- GUIs for Web Data Extraction.- Hash Functions.- Hash Join.- Hash-based Indexing.- Healthcare Metrics.- Hierarchial Clustering.- Hierarchical Data Model.- Hierarchical Data Summarization.- Hierarchical Heavy Hitter Mining on Streams.- Hierarchy.- High Dimensional Indexing.- Histogram.- Histograms on Streams.- History in Temporal Databases.- Homomorphic Encryption.- Horizontally Partitioned Data.- Human Factors Modeling in Crowdsourcing.- Human-centered Computing: Application to Multimedia.- Human-Computer Interaction.- Hypertexts.- I/O Model of Computation.- Icon.- Iconic Displays.- Image.- Image Content Modeling.- Image Database.- Image Management for Biological Data.- Image Metadata.- Image Querying.- Image Representation.- Image Retrieval and Relevance Feedback.- Image Segmentation.- Image Similarity.- Implementation of Database Operators (Joins, Group by, etc.).- Implication of Constraints.- Implications of Genomics for Clinical Informatics.- Implicit Event.- Incomplete Information.- Inconsistent Databases.- Incremental Computation of Queries.- Incremental Crawling.- Incremental Maintenance of Views with Aggregates.- Index Creation and File Structures.- Index Join.- Index Structures for Biological Sequences.- Index Tuning.- Indexed Sequential Access Method.- Indexing and Similarity Search.- Indexing Compressed Text.- Indexing Historical Spatio-Temporal Data.- Indexing in pub/sub systems.- Indexing Metric Spaces.- Indexing of Data Warehouses.- Indexing of the Current and Near-Future Positions of Moving Objects.- Indexing Techniques for Multimedia Data Retrieval.- Indexing the Web.- Indexing Uncertain Data.- Indexing Units of Structured Text Retrieval.- Indexing with Crowds.- Individually Identifiable Data.- Inference Control in Statistical Databases.- Information Extraction.- Information Filtering.- Information Foraging.- Information Integration.- Information Integration Techniques for Scientific Data.- Information Lifecycle Management.- Information Loss Measures.- Information Navigation.- Information Quality.- Information Quality and Decision Making.- Information Quality Assessment.- Information Quality Policy and Strategy.- Information Quality: Managing Information as a Product.- Information Retrieval.- Information Retrieval Models.- Information Retrieval Operations.- Infrastructure As-A-Service (IaaS).- Initiative for the Evaluation of XML Retrieval.- Initiator.- In-Network Query Processing.- Integrated DB and IR Approaches.- Integration of Rules and Ontologies.- Intelligent Storage Systems.- Interactive Analytics in Social Media.- Interface.- Interface Engines in Healthcare.- Interoperability in Data Warehouses.- Interoperation of NLP-based Systems with Clinical Databases.- Inter-Operator Parallelism.- Inter-Query Parallelism.- Intra-operator Parallelism.- Intra-Query Parallelism.- Intrusion Detection Technology.- Inverse Document Frequency.- Inverted Files.- IP Storage.- Iterator.- Java Database Connectivity.- Java Enterprise Edition.- Java Metadata Facility.- Join.- Join Dependency.- Join Index.- Join Order.- k-Anonymity.- Karp-Luby Sampling.- KDD Pipeline.- Key.- K-Means and K-Medoids.- Knowledge Base.- Knowledge Base Extraction.- Language Models.- Languages for Web Data Extraction.- Learning Distance Measures.- Lexical Analysis of Textual Data.- Licensing and Contracting Issues in Databases.- Lifespan.- Lightweight Ontologies.- Linear Hashing.- Linear Regression.- Linked Open Data.- Linking and Brushing.- Load Balancing in Peer-to-Peer Overlay Networks.- Load Shedding.- LOC METS.- Locality.- Locality of Queries.- Location Based Recommendation.- Location Management in Mobile Environments.- Location Update Management.- Location-Based Services.- Locking Granularity and Lock Types.- Logging and Recovery.- Logging/Recovery Subsystem.- Logical and Physical Data Independence.- Logical Database Design: from Conceptual to Logical Schema.- Logical Document Structure.- Logical Foundations of Web Data Extraction.- Logical Models of Information Retrieval.- Logical Unit Number.- Logical Unit Number Mapping.- Logical Volume Manager.- Log-Linear Regression.- Loop.- Loose Coupling.- Machine Learning in Computational Biology.- Main Memory.- Main Memory DBMS.- Maintenance of Materialized Views with Outer-Joins.- Maintenance of Recursive Views.- Managing Compressed Structured Text.- Managing Data Integration Uncertainty.- Managing Probabilistic Entity Extraction.- Mandatory Access Control.- MANET Databases.- MAP.- Map Matching.- MapReduce.- Markup Language.- MashUp.- Massive Array of Idle Disks.- Matrix Masking.- Max-Pattern Mining.- Mean Reciprocal Rank.- Measure.- Mediation.- Membership Query.- Memory Hierarchy.- Memory Locality.- Merkle Trees.- Message Authentication Codes.- Message Queuing Systems.- Meta Data Repository.- Meta Object Facility.- Metadata.- Metadata Interchange Specification.- Metadata Registry, ISO/IEC 11179.- Metamodel.- Metasearch Engines.- Metric Space.- Microaggregation.- Microbenchmark.- Microdata.- Microdata Rounding.- Middleware Support for Database Replication and Caching.- Middleware Support for Precise Failure Semantics.- Mining of Chemical Data.- Mobile Database.- Mobile Interfaces.- Mobile resource search.- Mobile Sensor Network Data Management.- Model Management.- Model-based Querying in Sensor Networks.- Monotone Constraints.- Monte Carlo Methods for Uncertain Data.- Moving Object.- Moving Objects Databases and Tracking.- MRR.- Multi-Data Center Consistency Properties.- Multi-Data Center Replication Protocols.- Multidimensional Data Formats.- Multidimensional Modeling.- Multidimensional Scaling.- Multi-Level Modeling.- Multi-Level Recovery and the ARIES Algorithm.- Multilevel Secure Database Management System.- Multilevel Transactions and Object-Model Transactions.- Multimedia Data.- Multimedia Data Buffering.- Multimedia Data Indexing.- Multimedia Data Querying.- Multimedia Data Storage.- Multimedia Databases.- Multimedia Information Retrieval Model.- Multimedia Metadata.- Multimedia Presentation Databases.- Multimedia Resource Scheduling.- Multimedia Retrieval Evaluation.- Multimedia Tagging.- Multimodal Interfaces.- Multi-Pathing.- Multiple Representation Modeling.- Multi-Query Optimization.- Multi-Resolution Terrain Modeling.- Multi-Step Query Processing.- Multitenancy.- Multi-Tier Architecture.- Multi-tier Storage Systems.- Multivalued Dependency.- Multivariate Visualization Methods.- Multi-version Serializability and Concurrency Control.- Naive Tables.- Narrowed Extended XPath I.- Natural Interaction.- Near-duplicate Retrieval.- Nearest Neighbor Classification.- Nearest Neighbor Query.- Nearest Neighbor Query in Spatio-temporal Databases.- Nested Loop Join.- Nested Transaction Models.- Network Attached Secure Device.- Network Attached Storage.- Network Data Model.- Neural Networks.- N-Gram Models.- Noise Addition.- Nonparametric Data Reduction Techniques.- Non-Perturbative Masking Methods.- Non-relational Streams.- Nonsequenced Semantics.- Normal Form ORA-SS Schema Diagrams.- Normal Forms and Normalization.- NoSQL Stores.- Now in Temporal Databases.- Null Values.- OASIS.- Object Constraint Language.- Object Data Models.- Object Identity.- Object Recognition.- Object Relationship Attribute Data Model for Semi-structured Data.- Object Storage Protocol.- Object-Role Modeling.- OLAM.- OLAP Personalization and Recommendation.- OLAP Personalization and Recommendation_old.- One-Copy-Serializability.- One-Pass Algorithm.- On-Line Analytical Processing.- Online Recovery in Parallel Database Systems.- Ontologies and Life Science Data Management.- Ontology.- Ontology Elicitation.- Ontology Engineering.- Ontology Visual Querying.- Ontology-Based Data Access and Integration.- Open Database Connectivity.- Open Information Extraction.- Open Nested Transaction Models.- Operator-Level Parallelism.- Opinion Mining.- Optimistic Replication and Resolution.- Optimization and Tuning in Data Warehouses.- OQL.- Orchestration.- Order Dependency.- OR-Join.- OR-Split.- OSQL.- Outlier Detection.- Overlay Network.- OWL: Web Ontology Language.- P/FDM.- Parallel and Distributed Data Warehouses.- Parallel Coordinates.- Parallel Data Placement.- Parallel Database Management.- Parallel Hash Join, Parallel Merge Join, Parallel Nested Loops Join.- Parallel Query Execution Algorithms.- Parallel Query Optimization.- Parallel Query Processing.- Parameterized Complexity of Queries.- Parametric Data Reduction Techniques.- Partial Replication.- Path Query.- Pattern-Growth Methods.- Peer Data Management System.- Peer to Peer Overlay Networks: Structure, Routing and Maintenance.- Peer-To-Peer Content Distribution.- Peer-to-Peer Data Integration.- Peer-to-Peer Publish-Subscribe Systems.- Peer-to-Peer Storage.- Peer-to-Peer System.- Peer-to-Peer Web Search.- Performance Analysis of Transaction Processing Systems.- Performance Monitoring Tools.- Period-Stamped Temporal Models.- Personalized Web Search.- Petri Nets.- Physical Clock.- Physical Database Design for Relational Databases.- Physical Layer Tuning.- Pipeline.- Pipelining.- Platform As-A-Service (PaaS).- Point-in-Time Copy.- Point-Stamped Temporal Models.- Polytransactions.- Positive Relational Algebra.- Possible Answers.- PRAM.- Precision.- Precision and Recall.- Precision at n.- Precision-Oriented Effectiveness Measures.- Predictive Analytics.- Preference Queries.- Preference Specification.- Prescriptive Analytics.- Presenting Structured Text Retrieval Results.- Primary Index.- Principal Component Analysis.- Privacy.- Privacy Metrics.- Privacy Policies and Preferences.- Privacy through Accountability.- Privacy-Enhancing Technologies.- Privacy-Preserving Data Mining.- Privacy-Preserving DBMSs.- Private Information Retrieval.- Probabilistic Databases.- Probabilistic Entity Resolution.- Probabilistic Retrieval Models and Binary Independence Retrieval (BIR) Model.- Probabilistic Skylines.- Probabilistic Spatial Queries.- Probabilistic Temporal Databases.- Probability Ranking Principle.- Probability Smoothing.- Process Life Cycle.- Process Mining.- Process Modeling.- Process Optimization.- Process Structure of a DBMS.- Processing Overlaps in Structured Text Retrieval.- Processing Structural Constraints.- Processor Cache.- Profiles and Context for Structured Text Retrieval.- Projection.- Propagation-based Structured Text Retrieval.- Protection from Insider Threats.- Provenance.- Provenance and Reproducibility.- Provenance in Databases.- Provenance in Scientific Databases.- Provenance in Workflows.- Provenance Management.- Provenance Standards.- Provenance Storage.- Provenance: Privacy and Security.- Pseudonymity.- Publish/Subscribe.- Publish/Subscribe over Streams.- Punctuations.- Q-measure.- Quadtrees (and Family).- Qualitative Temporal Reasoning.- Quality and Trust of Information Content and Credentialing.- Quality of Data Warehouses.- Quantiles on Streams.- Quantitative Association Rules.- QUEL.- Query by Humming.- Query Containment.- Query Evaluation Techniques for Multidimensional Data.- Query Expansion for Information Retrieval.- Query Expansion Models.- Query Language.- Query Languages and Evaluation Techniques for Biological Sequence Data.- Query Languages for the Life Sciences.- Query Load Balancing in Parallel Database Systems.- Query Optimization.- Query Optimization (in Relational Databases).- Query Optimization in Sensor Networks.- Query Plan.- Query Point Movement Techniques for Content-Based Image Retrieval.- Query Processing.- Query Processing (in Relational Databases).- Query Processing and Optimization in Object Relational Databases.- Query Processing in data integration systems.- Query Processing in Data Warehouses.- Query Processing in Deductive Databases.- Query Processing over Uncertain Data.- Query Processor.- Query Rewriting.- Query Rewriting Using Views.- Query Translation.- Quorum Systems.- Randomization Methods to Ensure Data Privacy.- Range Query.- Rank-aware Query Processing.- Ranked XML Processing.- Ranking Functions.- Ranking Views.- Rank-Join.- Rank-Join Indices.- Raster Data Management and Multi-Dimensional Arrays.- RDF Stores.- RDF Technology.- Real and Synthetic Test Datasets.- Real-Time Transaction Processing.- Recall.- Receiver Operating Characteristic.- Recommender Systems.- Record Linkage.- Record Matching.- Redundant Arrays of Independent Disks.- Reference Knowledge.- Region Algebra.- Regulatory Compliance in Data Management.- Relational Algebra.- Relational Calculus.- Relational Model.- Relationships in Structured Text Retrieval.- Relative Time.- Relevance.- Relevance Feedback.- Relevance Feedback for Content-Based Information Retrieval.- Relevance Feedback for Text Retrieval.- Replica Control.- Replica Freshness.- Replicated Data Types.- Replicated Database Concurrency Control.- Replication.- Replication Based on Group Communication.- Replication for Availability and Fault-Tolerance.- Replication for High Availability.- Replication for Paxos.- Replication for Scalability.- Replication in Multi-Tier Architectures.- Replication with Snapshot Isolation.- Reputation and Trust.- Request Broker.- Residuated Lattice.- Resource Allocation Problems in Spatial Databases.- Resource Description Framework.- Resource Description Framework (RDF) Schema (RDFS).- Resource Identifier.- Result Display.- Retrospective Event Processing.- Reverse Nearest Neighbor Query.- Reverse Top-k Queries.- Rewriting Queries using Views.- RMI.- Road Networks.- Rocchio's Formula.- Role Based Access Control.- R-Precision.- R-Tree (and Family).- Rule-based Classification.- Safety and Domain Independence.- Sagas.- Sampling Techniques for Statistical Databases.- SAN File System.- Scalable Decision Tree Construction.- Scheduler.- Scheduling Strategies for Data Stream Processing.- Schema Evolution.- Schema Mapping.- Schema Mapping Composition.- Schema Matching.- Schema Tuning.- Schema Versioning.- Scheme/Ontology Extraction.- Scientific Databases.- Scientific Visualization.- Scientific Workflows.- Score Aggregation.- Screen Scraper.- SCSI Target.- SDC Score.- Search Engine Metrics.- Searching Digital Libraries.- Second Normal Form (2NF).- Secondary Index.- Secure Data Outsourcing.- Secure Database Development.- Secure Multiparty Computation Methods.- Secure Transaction Processing.- Security Services.- Segmentation and Stratification.- Segmentation and Stratification_old.- Selection.- Selectivity Estimation.- Self-Maintenance of Views.- Self-Management Technology in Databases.- Semantic Atomicity.- Semantic Crowd Sourcing.- Semantic Data Integration for Life Science Entities.- Semantic Data Model.- Semantic Matching.- Semantic Modeling and Knowledge Representation for Multimedia Data.- Semantic Modeling for Geographic Information Systems.- Semantic Overlay Networks.- Semantic Social Web.- Semantic Streams.- Semantic Web.- Semantic Web Query Languages.- Semantic Web Services.- Semantics-based Concurrency Control.- Semijoin.- Semijoin Program.- Semi-Structured Data.- Semi-Structured Data Model.- Semi-Structured Database Design.- Semi-Structured Query Languages.- Semi-Supervised Learning.- Sensor Networks.- Sequenced Semantics.- Sequential Patterns.- Serializability.- Serializable Snapshot Isolation.- Service Component Architecture (SCA).- Service Oriented Architecture.- Session.- Shared-Disk Architecture.- Shared-Memory Architecture.- Shared-Nothing Architecture.- Side-Effect-Free View Updates.- Signature Files.- Similarity and Ranking Operations.- Simplicial Complex.- Singular Value Decomposition.- Skyline Queries and Pareto Optimality.- Snapshot Equivalence.- Snapshot Isolation.- Snippet.- Snowflake Schema.- SOAP.- Social Applications.- Social influence.- Social Media Analysis.- Social Media Analytics.- Social Media Harvesting.- Social network analysis.- Social Networks.- Software As-A-Service (SaaS).- Software Transactional Memory.- Software-Defined Storage.- Solid State Drive (SSD).- Sort-Merge Join.- Space-Filling Curves.- Space-Filling Curves for Query Processing.- SPARQL.- Sparse Index.- Spatial and Spatio-Temporal Data Models and Languages.- Spatial and Temporal Data Warehouses .- Spatial Anonymity.- Spatial Data Analysis.- Spatial Data Mining.- Spatial Data Types.- Spatial Datawarehousing.- Spatial Indexing Techniques.- Spatial Join.- Spatial Keyword Search.- Spatial Matching Problems.- Spatial Network Databases.- Spatial Operations and Map Operations.- Spatial Queries in the Cloud.- Spatio-Temporal Data Mining.- Spatio-Temporal Data Types.- Spatio-Temporal Data Warehouses.- Spatiotemporal Interpolation Algorithms.- Spatio-Temporal Selectivity Estimation.- Spatio-Temporal Trajectories.- Specialization and Generalization.- Specificity.- Spectral Clustering.- Split.- Split Transactions.- SQL.- SQL Analytics on Big Data.- SQL Isolation Levels.- SQL-Based Temporal Query Languages.- Stable Distribution.- Stack-based Query Language.- Staged DBMS.- Standard Effectiveness Measures.- Star Index.- Star Schema.- State-based Publish/Subscribe.- Statistical Data Management.- Statistical Disclosure Limitation For Data Access.- Steganography.- Stemming.- Stop-&-go Operator.- Stoplists.- Storage Access Models.- Storage Area Network.- Storage Consolidation.- Storage Devices.- Storage Grid.- Storage Management.- Storage Management Initiative-Specification.- Storage Manager.- Storage Network Architectures.- Storage Networking Industry Association.- Storage of Large Scale Multidimensional Data.- Storage Power Management.- Storage Protection.- Storage Protocols.- Storage Resource Management.- Storage Security.- Storage Virtualization.- Stored Procedure.- Stream Mining.- Stream Models.- Stream Processing.- Stream processing on modern hardware.- Stream Reasoning.- Stream Sampling.- Stream Similarity Mining.- Streaming Analytics.- Streaming Applications.- Stream-Oriented Query Languages and Operators.- Strong Consistency Models for Replicated Data.- Structural Indexing.- Structure Analytics in Social Media.- Structure Weight.- Structured Data in Peer-to-Peer Systems.- Structured Document Retrieval.- Structured Text Retrieval Models.- Subject Spaces.- Subspace Clustering Techniques.- Success at n.- Succinct Constraints.- Suffix Tree.- Summarizability.- Summarization.- Support Vector Machine.- Supporting Transaction Time Databases.- Symbolic Representation.- Symmetric Encryption.- Synopsis Structure.- Synthetic Microdata.- System R (R*) Optimizer.- Table.- Tabular Data.- Taxonomy: Biomedical Health Informatics.- tBench.- Telic Distinction in Temporal Databases.- Telos.- Temporal Access Control.- Temporal Aggregation.- Temporal Algebras.- Temporal Analytics in Social Media.- Temporal Benchmarks.- Temporal Coalescing.- Temporal Compatibility.- Temporal Conceptual Models.- Temporal Constraints.- Temporal Data Mining.- Temporal Data Models.- Temporal Database.- Temporal Datawarehousing.- Temporal Dependencies.- Temporal Element.- Temporal Expression.- Temporal Generalization.- Temporal Granularity.- Temporal Homogeneity.- Temporal Indeterminacy.- Temporal Integrity Constraints.- Temporal Joins.- Temporal Logic in Database Query Languages.- Temporal Logical Models.- Temporal Object-Oriented Databases.- Temporal Periodicity.- Temporal Projection.- Temporal PSM.- Temporal Query Languages.- Temporal Query Processing.- Temporal Relational Calculus.- Temporal Specialization.- Temporal Strata.- Temporal Support in the SQL Standard.- Temporal Vacuuming.- Temporal Visual Languages.- Temporal XML.- Term Proximity.- Term Statistics for Structured Text Retrieval.- Term Weighting.- Test Collection.- Text Analytics.- Text Analytics in Social Media.- Text Categorization.- Text Clustering.- Text Compression.- Text Generation.- Text Index Compression.- Text Indexing and Retrieval.- Text Indexing Techniques.- Text Mining.- Text Mining of Biological Resources.- Text Representation.- Text Segmentation.- Text Semantic Representation.- Text Stream Processing.- Text Streaming Model.- Text Summarization.- Text Visualization.- TF*IDF.- Thematic Map.- Third Normal Form.- Three-Dimensional GIS and Geological Applications.- Three-Phase Commit.- Tight Coupling.- Time Aggregated Graphs.- Time and Information Retrieval.- Time Domain.- Time in Philosophical Logic.- Time Instant.- Time Interval.- Time Period.- Time Series Query.- Time Span.- Time-Line Clock.- Timeslice Operator.- Topic Detection and Tracking.- Topic Maps.- Topic-based Publish/Subscribe.- Top-k Queries.- Top-K Selection Queries on Multimedia Datasets.- Topological Data Models.- Topological Relationships.- Trajectory.- Transaction.- Transaction Chopping.- Transaction Management.- Transaction Manager.- Transaction Models - the Read/Write Approach.- Transaction Time.- Transactional Middleware.- Transactional Processes.- Transactional Stream Processing.- Transaction-Time Indexing.- Tree-based Indexing.- Treemaps.- Triangular Norms.- Triangulated Irregular Network.- Trie.- Trip Planning Queries.- Trust and Reputation in Peer-to-Peer Systems.- Trust in Blogosphere.- Trusted Hardware.- TSQL2.- Tuning Concurrency Control.- Tuple-Generating Dependencies.- Two-Dimensional Shape Retrieval.- Two-Phase Commit.- Two-Phase Commit Protocol.- Two-Phase Locking.- Two-Poisson model.- Type-based Publish/Subscribe.- U-measure.- Uncertain Data Lineage.- Uncertain Data Mining.- Uncertain Data Models.- Uncertain Data Streams.- Uncertain Data Summarization.- Uncertain Graph Data Management.- Uncertain Spatial Data Management.- Uncertain Top-k Queries.- Uncertainty in Events.- Uncertainty Management in Scientific Database Systems.- Unicode.- Unified Modeling Language.- Union.- Unobservability.- Updates and Transactions in Peer-to-Peer Systems.- Updates through Views.- Usability.- User-Defined Time.- Valid Time.- Valid-Time Indexing.- Value Equivalence.- Variable Time Span.- Vector-Space Model.- Vertically Partitioned Data.- Video.- Video Content Analysis.- Video Content Modeling.- Video Content Structure.- Video Metadata.- Video Querying.- Video Representation.- Video Scene and Event Detection.- Video Segmentation.- Video Sequence Indexing.- Video Shot Detection.- Video Summarization.- View Adaptation.- View Definition.- View Maintenance.- View Maintenance Aspects.- View-based Data Integration.- Views.- Virtual Partitioning.- Visual Analytics.- Visual Association Rules.- Visual Classification.- Visual Clustering.- Visual Content Analysis.- Visual Data Mining.- Visual Formalisms.- Visual Interaction.- Visual Interfaces.- Visual Interfaces for Geographic Data.- Visual interfaces for streaming data.- Visual Metaphor.- Visual On-Line Analytical Processing (OLAP).- Visual Perception.- Visual Query Language.- Visual Representation.- Visualization for Information Retrieval.- Visualization Pipeline.- Visualizing Categorical Data.- Visualizing Clustering Results.- Visualizing Hierarchical Data.- Visualizing Network Data.- Visualizing Quantitative Data.- Volume.- Voronoi Diagrams.- W3C.- WAN Data Replication.- Wavelets on Streams.- Weak Consistency Models for Replicated Data.- Weak Equivalence.- Web 2.0/3.0.- Web Advertising.- Web Characteristics and Evolution.- Web Crawler Architecture.- Web Data Extraction System.- Web ETL.- Web Harvesting.- Web Information Extraction.- WEB Information Retrieval Models.- Web Mashups.- Web Page Quality Metrics.- Web Question Answering.- Web Search Query Rewriting.- Web Search Relevance Feedback.- Web Search Relevance Ranking.- Web Search Result Caching and Prefetching.- Web Search Result De-duplication and Clustering.- Web Services.- Web Services and the Semantic Web for Life Science Data.- Web Spam Detection.- Web Transactions.- Web Views.- What-If Analysis.- WIMP Interfaces.- Window operator in RDBMS.- Window-based Query Processing.- Windows.- Workflow Constructs.- Workflow Evolution.- Workflow Join.- Workflow Management.- Workflow Management and Workflow Management System.- Workflow Management Coalition.- Workflow Model.- Workflow Model Analysis.- Workflow Patterns.- Workflow Schema.- Workflow Transactions.- Wrapper Induction.- Wrapper Maintenance.- Wrapper Stability.- Write Once Read Many.- XML.- XML Access Control.- XML Attribute.- XML Benchmarks.- XML Compression.- XML Document.- XML Element.- XML Indexing.- XML Information Integration.- XML Integrity Constraints.- XML Metadata Interchange.- XML Metadata Interchange Specification (XMI).- XML Parsing, SAX/DOM.- XML Process Definition Language.- XML Programming.- XML Publish/Subscribe.- XML Publishing.- XML Retrieval.- XML Schema.- XML Selectivity Estimation.- XML Storage.- XML Stream Processing.- XML Tree Pattern, XML Twig Query.- XML Tuple Algebra.- XML Typechecking.- XML Types.- XML Updates.- XML Views.- XPath/XQuery.- XQuery Full-Text.- XQuery Processors.- XSL/XSLT.- Zero-One Laws.- Zooming Techniques.- α-nDCG.-
£4,422.28
APress Mastering Snowflake Solutions
Book SynopsisDesign for large-scale, high-performance queries using Snowflake's query processing engine to empower data consumers with timely, comprehensive, and secure access to data. This book also helps you protect your most valuable data assets using built-in security features such as end-to-end encryption for data at rest and in transit. It demonstrates key features in Snowflake and shows how to exploit those features to deliver a personalized experience to your customers. It also shows how to ingest the high volumes of both structured and unstructured data that are needed for game-changing business intelligence analysis.Mastering Snowflake Solutionsstarts with a refresher on Snowflake's unique architecture before getting into the advanced concepts that make Snowflake the market-leading product it is today. Progressing through each chapter, you will learn how to leverage storage, query processing, cloning, data sharing, and continuous data protection features. This approach allows for greater Table of Contents1. Snowflake Architecture2. Data Movement3. Cloning4. Managing Security and User Access Control 5. Protecting Data in Snowflake6. Business Continuity and Disaster Recovery7. Data Sharing and the Data Cloud8. Programming9. Advanced Performance Tuning10. Developing Applications in Snowflake
£46.74
APress Building the Snowflake Data Cloud
Book SynopsisImplement the Snowflake Data Cloud using best practices and reap the benefits of scalability and low-cost from the industry-leading, cloud-based, data warehousing platform. This book provides a detailed how-to explanation, and assumes familiarity with Snowflake core concepts and principles. It is a project-oriented book with a hands-on approach to designing, developing, and implementing your Data Cloud with security at the center. As you work through the examples, you will develop the skill, knowledge, and expertise to expand your capability by incorporating additional Snowflake features, tools, and techniques. Your Snowflake Data Cloud will be fit for purpose, extensible, and at the forefront of both Direct Share, Data Exchange, and Snowflake Marketplace. Building the Snowflake Data Cloud helps you transform your organization into monetizing the value locked up within your data. As the digital economy takes hold, with data volume, velociTable of ContentsPart I. Context 1. The Snowflake Data Cloud 2. Breaking Data Siloes Part II. Concepts 3. Architecture 4. Account Security5. Role Based Access Control (RBAC)6. Account Usage StorePart III. Tools7. Ingesting Data8. Data Pipelines9. Data Presentation10. Semi Structured and Unstructured DataPart IV. Management11. Query Optimizer Basics12. Data Management13. Data Modelling14. Snowflake Data Cloud By Example
£46.74
APress Data Science and Analytics for SMEs
Book SynopsisMaster the tricks and techniques of business analytics consulting, specifically applicable to small-to-medium businesses (SMEs). Written to help you hone your business analytics skills, this book applies data science techniques to help solve problems and improve upon many aspects of a business'' operations. SMEs are looking for ways to use data science and analytics, and this need is becoming increasingly pressing with the ongoing digital revolution. The topics covered in the books will help to provide the knowledge leverage needed for implementing data science in small business. The demand of small business for data analytics are in conjunction with the growing number of freelance data science consulting opportunities; hence this book will provide insight on how to navigate this new terrain. This book uses a do-it-yourself approach to analytics and introduces tools that are easily available online and are non-programming based. Data science Trade Review“By reading the book and working out the use case, subject matter experts will be able to get a coherent roadmap to the main techniques available for both descriptive and predictive data analytics, as well as be able to provide simple services related to their company data and future prospects.” (Rosario Uceda-Sosa, Computing Reviews, October 2, 2023)Table of Contents INTRODUCTIONWe introduce data science generally and narrow it down to data science for business which is also referred to as business analytics. We then give a detailed explanation of the process involved in business analytics in form of the business analytics journey. In this journey, we explain what it takes from start to finish to carry out an analytics project in the business world, focusing on small business consulting, even though the process is generic to all types of business, small or large. We also give a description of what small business refers to in this book and the peculiarities of navigating an analytics project in such a terrain. To conclude the chapter, we talk about the types of analytics problems that is common to small business and the tools available to solve these problems given the budget situation of small businesses when it comes to analytics project.· DATA SCIENCE· DATA SCIENCE FOR BUSINESS· BUSINESS ANALYTICS JOURNEY· SMALL AND MEDIUM BUSINESS (SME)· BUSINESS ANALYTICS IN SMALL BUSINESS· TYPES OF ANALYTICS PROBLEMS IN SME· ANALYTICS TOOLS FOR SMES· ROAD MAPS TO THIS BOOK· PROBLEMS· REFERENCES CHAPTER 1: DATA FOR ANALYSIS IN SMALL BUSINESSIn this chapter, we would look at the various sources of data generally and in small business. This chapter is important because the major challenge of consulting for small business is the lack of data or quality data for analysis. This chapter will therefore detail the sources of data for analysis explaining first the type or form that data exists and some general ideas of how to collect such data. It gives an overview on data quality and integrity issues and touches on data literacy. The chapter also includes the typical data preparation procedures for the common types of techniques used in small business analytics and by extension used in this book. To conclude the chapter, we look at data visualization, particularly towards preparing data for various analytics task as explained in section 1.3.· SOURCE OF DATA· DATA QUALITY & INTEGRITY· DATA GOVERNANCE· DATA PREPARATION· DATA VISUALIZATION· PROBLEMS· REFERENCESCHAPTER 2: BUSINESS ANALYTICS CONSULTINGIn this chapter, we will look at business analytics consulting, particularly what the concept implies and how to build such a career path. We will explain the types of business analytics consulting that exist and then narrow it down to how to navigate the world of business analytics consulting for small business. In this chapter, we will look at how to manage a typical analytics project and measure the success of analytics projects. In conclusion, we will discuss issues revolving around how to bill analytics project particularly as a consultant.· BUSINESS ANALYTICS CONSULTING· MANAGING ANALYTICS PROJECT· SUCCESS METRICS IN ANALYTICS PROJECT· BILLING ANALYTICS PROJECT· PROBLEMS· REFERENCESCHAPTER 3: BUSINESS ANALYTICS CONSULTING PHASESIn this chapter we will look at the stages involved business analytics consulting, particularly when the analytics service is offered as a product from either within or outside the business. We will look at the proposal and initial analysis stage which gives direction to the analytics project. Then we look at the details involved in the pre-engagement, engagement and post engagement phase. It is important to know that the stages are presented in a typical or generic way but when implemented, there might be reason to modify or customize them for the application scenario.· PROPOSAL & INITIAL ANALYSIS· PRE- ENGAGEMENT PHASE· ENGAGEMENT PHASE· POST ENGAGEMENT PHASE· PROBLEMS· REFERENCES CHAPTER 4: DESCRIPTIVE ANALYTICS TOOLSThis chapter is focused on the mostly common descriptive analytics tools used in business generally and specifically in small businesses. The chapter will help to use descriptive analytics tools to understand your business and make recommendations that can improve your business profits. For small business, descriptive analytics helps SMEs to make sense of available data in order to monitor business indicators at a glance, helps SME owners to observe sales trends and patterns on an overall basis, as well as deep-dive into product categories and customer groups. It also helps SME’s to plan product strategy, pricing policies that will maximize their projected revenues and derive a lot of valuable insights for getting more customers. · INTRODUCTION· BAR CHART· HISTOGRAM· LINE GRAPHS· SCATTER PLOTS· PACKED BUBBLES CHARTS· HEAT MAPS· GEOGRAPHICAL MAPS· A PRACTICAL BUSINESS PROBLEM I· PROBLEMS· REFERENCES CHAPTER 5: PREDICTION TECHNIQUESIn this chapter, we will explore the popular techniques used for prediction, particularly in retails business. The approach used in explaining these techniques us to use them in solving a business problem. The second business problem to be addressed is the sales prediction problem which is common in retail business. The chapter first explain the fundamental concept of prediction techniques, next we look at how such techniques are evaluated. After this, we describe the business problem we intend solving. We then pick each of the selected techniques one by one and explain the algorithms involved and how they can be used to solve the problem described. The prediction techniques used and compared are the Multiple linear regression, the Regression Trees and the Neural Network. To conclude the chapter, we compare the results of the three algorithms and conclude on the problem in question. In this chapter therefore, the analytics products being offered is to solve sales prediction problem for small retail business.· INTRODUCTION· PRACTICAL BUSINESS PROBLEM II (SALES PREDICTION)· MULTIPLE LINEAR REGRESSION· REGRESSIN TREES· NEURAL NETWORK (PREDICTION)· CONCLUSION ON SALES PREDICTION· PROBLEMS· REFERENCES CHAPTER 6: CLASSIFICATION TECHNIQUESIn this chapter, even though there are several classification techniques, we will explore the popular ones used for classification in the business domain. In doing this, we will use the third business problem centered on customer loyalty comparing neural network, classification tree and random forest algorithms. In solving this problem, we are particular about how to get and retain more customers for our small business. We will also introduce some other classification based techniques such as K-nearest neighbour logistic regression and persuasion modelling. We will use persuasion modelling for the fourth practical business problem. In using these techniques to solve the problem we explain the fundamental concepts in the chosen algorithms and use them to demonstrate how this problems solving process can be adopted in real business scenarios.· CLASSIFICATION MODELS & EVALUATION· PRACTICAL BUSINESS PROBLEM III (CUSTOMER LOYALTY)· NEURAL NETWORK· CLASSIFICATION TREE· RANDOM FOREST & BOOSTED TREES· K NEAREST NEIGHBOUR· LOGISTIC REGRESSION· PROBLEMS· REFERENCES CHAPTER 7: ADVANCED DESCRIPTIVE ANALYTICSThis chapter is focused mainly on advanced descriptive analytics techniques. In this chapter, we will first explain the concept of clustering which is a type of unsupervised learning approach. We will then pick one clustering technique which is the K means clustering. Using the fourth practical business problem, we will explain how we can use the K means clustering technique to solve a real business problem. Next will explain the association rule example and finally Network analysis. We conclude with the fifth business problem which is focused on using network analytics for employee efficiency.· CLUSTERING· K MEANS· PRACTICAL BUSINESS PROBLEM IV (Customer Segmentation)· ASSOCIATION ANALYSIS· NETWORK ANALYSIS· PRACTICAL BUSINESS PROBLEM V (Staff Efficiency)· PROBLEMS· REFERENCES CHAPTER 8: CASE STUDY PART IThis chapter is the beginning part of major consulting case study for this book. We will explain what transpired during a typical business analytics consulting and help to create a road map or an example of how to navigate a business analytics consulting project. We start with a description of the SME Ecommerce environment generally, since this is the business environment of our selected case study, we then talk about the sources of data for analytics peculiar this environment. Next we describe the business to be used as case study briefly, followed by the analytics road map peculiar to consulting for this business. This chapter ends with the results of the initial analysis and pre engagement phase which forms the bases for the detailed analytics and implementation phase in chapter 10.· SME ECORMERCE· INTRODUCTION TO SME CASE STUDY· INITIAL ANALYSIS· ANALYTICS APPROACH · PRE –ENGAGEMENT· PROBLEMS· REFERENCES CHAPTER 9: CASE STUDY PART IIIn this chapter, we will conclude the case study used for illustration of a typical business analytics consulting for an SME by presenting the details of the engagement phase for the case study in question. The post engagement phase is left out as the implementation of the recommendations is determined by the systems and procedures of the business. It is important to note that the consulting steps can be customized for any small business based on the intended problem. The whole steps described in chapter 9 and 10 have been made simple for understanding, though in real life business application there might be need to iterate the process until satisfactory results have been gotten. This is because you constantly need to incorporate feedback from the stakeholders and domain experts.· GOAL 1: INCREASE WEBSITE TRAFFIC· GOAL 2: INCREASE WEBSITE SALES REVENUE· PROBLEMS· REFERENCES
£31.34
APress Google Cloud Platform for Data Science
Book SynopsisThis book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will LearnSet up a GCP account and projectExplore BigQuery and its use cases, including machine learningUnderstand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning modelsExplore Google Cloud Dataproc and its use cases for big data processingCreate and share data visualizations and reports with Looker Data StudioExplore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud DataflowExplore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streamingWho This Book Is ForData scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projectsTable of ContentsChapter 1: Introduction to GCP.- Chapter 2: Google Colaboratory.- Chapter 3: Big Data and Machine Learning.- Chapter 4: Data Visualization and Business Intelligence.- Chapter 5: Data Processing and Transformation.- Chapter 6: Data Analytics and Storage.- Chapter 7: Advanced Topics.
£38.24
O'Reilly Media Big Data for Chimps
Book SynopsisFinding patterns in massive event streams can be difficult, but learning how to find them doesn't have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop.
£25.59
O'Reilly Media Learning to Love Data Science
Book SynopsisToday, big data is taken seriously, and data science is considered downright sexy. With this anthology of reports from award-winning journalist Mike Barlow, you'll appreciate how data science is fundamentally altering our world, for better and for worse.
£16.99
O'Reilly Media Agile Data Science 2.0
Book SynopsisWith the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools.
£35.99
O'Reilly Media Getting Started with Kudu
Book SynopsisWith this practical guide, you'll learn how Kudu's architecture and features solve a unique problem in the Hadoop ecosystem. If you're familiar with other storage layer projects such HDFS, HBase, Spanner, and Cassandra, you'll quickly learnand appreciatethe unique contribution Kudu makes to this ecosystem.
£29.99
O'Reilly Media The Practitioners Guide to Graph Data
Book SynopsisGraph data closes the gap between the way humans and computers view the world. While computers rely on static rows and columns of data, people navigate and reason about life through relationships. This practical guide demonstrates how graph data brings these two approaches together.
£47.99
O'Reilly Media Mastering Spark with R
Book SynopsisWith this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems.
£33.74
O'Reilly Media Applied Natural Language Processing in the
Book SynopsisThis hands-on guide helps you get up to speed on the latest and most promising trends in NLP. With some Python experience and a basic understanding of machine learning, you'll learn how to build and deploy real-world NLP applications in your organization.
£51.74
Information Age Publishing A Primer on Business Analytics: Perspectives from
Book SynopsisThis book will provide a comprehensive overview of business analytics, for those who have either a technical background (quantitative methods) or a practitioner business background. Business analytics, in the context of the 4th Industrial Revolution, is the "new normal" for businesses that operate in this digital age. This book provides a comprehensive primer and overview of the field (and related fields such as Business Intelligence and Data Science). It will discuss the field as it applies to financial institutions, with some minor departures to other industries. Readers will gain understanding and insight into the field of data science, including traditional as well as emerging techniques. Further, many chapters are dedicated to the establishment of a data-driven team – from executive buy-in and corporate governance to managing and quantifying the return of data-driven projects.
£44.96
Information Age Publishing A Primer on Business Analytics: Perspectives from
Book SynopsisThis book will provide a comprehensive overview of business analytics, for those who have either a technical background (quantitative methods) or a practitioner business background. Business analytics, in the context of the 4th Industrial Revolution, is the "new normal" for businesses that operate in this digital age. This book provides a comprehensive primer and overview of the field (and related fields such as Business Intelligence and Data Science). It will discuss the field as it applies to financial institutions, with some minor departures to other industries. Readers will gain understanding and insight into the field of data science, including traditional as well as emerging techniques. Further, many chapters are dedicated to the establishment of a data-driven team – from executive buy-in and corporate governance to managing and quantifying the return of data-driven projects.
£82.80
Information Age Publishing Contemporary Perspectives in Data Mining
Book SynopsisThe series, Contemporary Perspectives on Data Mining, is composed of blind refereed scholarly research methods and applications of data mining. This series will be targeted both at the academic community, as well as the business practitioner.Data mining seeks to discover knowledge from vast amounts of data with the use of statistical and mathematical techniques. The knowledge is extracted from this data by examining the patterns of the data, whether they be associations of groups or things, predictions, sequential relationships between time order events or natural groups.Data mining applications are in marketing (customer loyalty, identifying profitable customers, instore promotions, e-commerce populations); in business (teaching data mining, efficiency of the Chinese automobile industry, moderate asset allocation funds); and techniques (veterinary predictive models, data integrity in the cloud, irregular pattern detection in a mobility network and road safety modeling.)
£44.96
Information Age Publishing Contemporary Perspectives in Data Mining
Book SynopsisThe series, Contemporary Perspectives on Data Mining, is composed of blind refereed scholarly research methods and applications of data mining. This series will be targeted both at the academic community, as well as the business practitioner.Data mining seeks to discover knowledge from vast amounts of data with the use of statistical and mathematical techniques. The knowledge is extracted from this data by examining the patterns of the data, whether they be associations of groups or things, predictions, sequential relationships between time order events or natural groups.Data mining applications are in marketing (customer loyalty, identifying profitable customers, instore promotions, e-commerce populations); in business (teaching data mining, efficiency of the Chinese automobile industry, moderate asset allocation funds); and techniques (veterinary predictive models, data integrity in the cloud, irregular pattern detection in a mobility network and road safety modeling.)
£82.80
John Wiley & Sons Inc New Challenges for Knowledge: Digital Dynamics to
Book SynopsisDigital technologies are reshaping every field of social and economic lives, so do they in the world of scientific knowledge. “The New Challenges of Knowledge” aims at understanding how the new digital technologies alter the production, diffusion and valorization of knowledge. We propose to give an insight into the economical, geopolitical and political stakes of numeric in knowledge in different countries. Law is at the center of this evolution, especially in the case of national and international confusion about Internet, Science and knowledge.Trade Review“Sharing economy models are rippling through the world of scientific knowledge and research; open access brings challenges for developers, researchers, and policy makers – all treated here in the context of law-making” The Magpi, issue 60, Aug 2017Table of ContentsIntroduction . xiii Part 1. Production: Global Knowledge and Science in the Digital Era 1 Chapter 1. Current Knowledge Dynamics 3 1.1. Transparency of scientific data 4 1.2. Transparency of experimental protocol 6 1.3. A necessary form of research engineering 7 1.4. Confusion between data and scientific results: avoiding manipulation of research results 8 Chapter 2. Digital Conditions for Knowledge Production 11 2.1. An economic system oriented toward innovation 11 2.2. What of knowledge and indeed the concept of the commons? 13 2.3. From analog to digital 14 2.4. User–producer: civil society enters the knowledge production system 16 2.5. The interactions between the various spheres of knowledge production 18 2.6. Collaboration between society and knowledge: producing authorities should be put into perspective 20 Chapter 3. The Dual Relationship between the User and the Developer 23 3.1. Legal arrangements for knowledge-sharing using development platforms 23 3.2. The user contributes to the creation and development of content process 25 Chapter 4. Researchers’ Uses and Needs for Scientific and Technical Information 29 4.1. The CNRS survey 29 4.2. Diverse uses and dual needs 31 4.3. An explanation through differentiated scientific analysis 33 Chapter 5. New Tools for Knowledge Capture 37 5.1. The growth of metadata exploitation 37 5.2. Are we moving toward a semantic Web? 38 5.3. Tools and limits for metadata processing 39 5.4. The challenges of the semantic Web 40 Chapter 6. Modes of Knowledge Sharing and Technologies 43 6.1. Data storage technologies and access allowing knowledge sharing 43 6.2. Exchange platforms and catalogs 44 6.3. Knowledge-processing and digital editions 45 Part 2. Sharing Mechanisms: Knowledge Sharing and the Knowledge-based Economy 47 Chapter 7. Business Model for Scientific Publication 49 7.1. The current economic model is changing so as to adapt to new conditions for knowledge sharing 49 7.2. Creation of a new model 51 7.3. The issues raised by the creation of a new economic model 52 7.4. A new economic model struggling to fine its niche 54 Chapter 8. Actor Strategy: International Scientific Publishing, Services with High Added Value and Research Communities 57 8.1. Publishing, editing and existing: live issues within the publication of Scientific and Technical Information (STI) 58 8.2. Who is subject to it? The other players in scientific publishing 59 8.3. The characteristics of SMS (Science of Man and Society) 60 8.4. Existing without publishing? New STI directions 62 8.5. Alternatives to scientific publishing 63 Chapter 9. New Approaches to Scientific Production 67 9.1. New means of access to scientific production: innovative models 67 9.2. Two main objectives: accelerating knowledge sharing and promoting scientific collaboration 71 9.3. The need for new analytical tools and the risk of reprivatization of scientific knowledge. 72 9.4. The absence of the usage doctrine and the risk of reprivatization of science: the case of social networks 74 Chapter 10. The Geopolitics of Science 77 10.1. National convergent research models 78 10.2. Science is a source of international cooperation 81 10.3. International scientific cooperation is accelerating 84 Chapter 11. Copyright Serving the Market 85 Part 3. Enhancement Knowledge Rights and Public Policies in the Wake of Digital Technology 89 Chapter 12. Legal Protection of Scientific Research Results in the Humanities and Social Sciences 91 12.1.Different legal protections for different kinds of science 91 12.2. Why protect? 92 12.3. How to protect 93 12.4. Protect against whom? 98 12.5. Changing the challenges of Internet protection 99 12.6. Legal obstacles related to the author’s right 100 Chapter 13. Development of Knowledge and Public Policies 103 13.1. Knowledge enhancement concerns everyone 104 13.2. What are the public policies for enhancing knowledge? 105 13.3. State establishment of connections between actors: a key tool in knowledge enhancement 107 13.4. Comparing the United States and the European Union 109 Chapter 14. From Author to Enhancer 111 14.1. Enhancing scientific research is a complex process 112 14.2. Scientific research enhancement follows a legislative framework intended to promote innovation 114 Chapter 15. The Right to Knowledge: Moving Toward a Universal Law? 117 15.1. Unclear regulatory frameworks 118 15.2. Developing legal frameworks related to the Internet is complicated 121 15.3. Proposals for developing legal frameworks for the Internet 123 Chapter 16. Governing by Algorithm 127 16.1. Statistics that foreshadow algorithms 128 16.2. Algorithmic governance and democratic opportunities 130 Chapter 17. Public Data and Science in e-Government 133 17.1. Disseminating data and disseminating science: a new requirement 134 17.2. Public data in the e-government 137 17.3. Science within e-government 139 Chapter 18. Surveillance, Sousveillance, Improper Capturing 141 18.1. The traditional legal framework for information capture 142 18.2. The clear need for a specific law 145 Chapter 19. Public Knowledge Policies in the Digital Age 149 19.1. GAFA domination and the oligopolization of the market 150 19.2. Isolated digital ecosystems 152 19.3. Regulation through competition law 153 19.4. Data protection: moving toward a law for the digital community 154 Chapter 20. The Politics of Creating Artificial Intelligence 157 20.1. History 158 20.2. Artificial intelligence has become a priority for public and private actors 160 20.4. The appearance of legal problems 162 Chapter 21. Security Policies in Artificial Intelligence 165 21.1. Security as a comment on machines and data 166 21.2. From the security of machines to the security of humans 169 Conclusion 175 Postscript 177 Glossary 179 Bibliography 185 Index 201
£125.06
ISTE Ltd and John Wiley & Sons Inc Argument Mining: Linguistic Foundations
Book SynopsisThis book is an introduction to the linguistic concepts of argumentation relevant for argument mining, an important research and development activity which can be viewed as a highly complex form of information retrieval, requiring high-level natural language processing technology. While the first four chapters develop the linguistic and conceptual aspects of argument expression, the last four are devoted to their application to argument mining. These chapters investigate the facets of argument annotation, as well as argument mining system architectures and evaluation. How annotations may be used to develop linguistic data and how to train learning algorithms is outlined. A simple implementation is then proposed. The book ends with an analysis of non-verbal argumentative discourse. Argument Mining is an introductory book for engineers or students of linguistics, artificial intelligence and natural language processing. Most, if not all, the concepts of argumentation crucial for argument mining are carefully introduced and illustrated in a simple manner.Table of ContentsPreface xi Chapter 1. Introduction and Challenges 1 1.1. What is argumentation? 1 1.2. Argumentation and argument mining 4 1.3. The origins of argumentation 7 1.4. The argumentative discourse 8 1.5. Contemporary trends 10 Chapter 2. The Structure of Argumentation 13 2.1. The argument–conclusion pair 13 2.2. The elementary argumentative schema 14 2.2.1. Toulmin’s argumentative model 14 2.2.2. Some elaborations and refinements of Toulmin’s model 17 2.2.3. The geometry of arguments 18 2.3. Modeling agreement and disagreement 20 2.3.1. Agreeing versus disagreeing 20 2.3.2. The art of resolving divergences 23 2.4. The structure of an argumentation: argumentation graphs 25 2.5. The role of argument schemes in argumentation 27 2.5.1. Argument schemes: main concepts 27 2.5.2. A few simple illustrations 28 2.5.3. Argument schemes based on analogy 29 2.5.4. Argument schemes based on causality 30 2.6. Relations between Toulmin’s model and argumentation schemes 31 2.6.1. Warrants as a popular opinion 32 2.6.2. Argument schemes based on rules, explanations or hypothesis 34 2.6.3. Argument schemes based on multiple supports or attacks 35 2.6.4. Causality and warrants 37 Chapter 3. The Linguistics of Argumentation 39 3.1. The structure of claims 40 3.2. The linguistics of justifications 45 3.3. Evaluating the strength of claims, justifications and arguments 47 3.3.1. Strength factors within a proposition 49 3.3.2. Structuring expressions of strength by semantic category 51 3.3.3. A simple representation of strength when combining several factors 52 3.3.4. Pragmatic factors of strength expression 53 3.4. Rhetoric and argumentation 59 3.4.1. Rhetoric and communication 60 3.4.2. Logos: the art of reasoning and of constructing demonstrations 61 3.4.3. Ethos: the orator profile 62 3.4.4. Pathos: how to persuade an audience 63 Chapter 4. Advanced Features of Argumentation for Argument Mining 65 4.1. Managing incoherent claims and justifications 65 4.1.1. The case of justifications supporting opposite claims 66 4.1.2. The case of opposite justifications justifying the same claim 67 4.2. Relating claims and justifications: the need for knowledge and reasoning 67 4.2.1. Investigating relatedness via corpus analysis 68 4.2.2. A corpus analysis of the knowledge involved 69 4.2.3. Observation synthesis 72 4.3. Argument synthesis in natural language 74 4.3.1. Features of a synthesis 75 4.3.2. Structure of an argumentation synthesis 76 Chapter 5. From Argumentation to Argument Mining 79 5.1. Some facets of argument mining 79 5.2. Designing annotation guidelines: some methodological elements 81 5.3. What results can be expected from an argument mining system? 82 5.4. Architecture of an argument mining system 83 5.5. The next chapters 84 Chapter 6. Annotation Frameworks and Principles of Argument Analysis 85 6.1. Principles of argument analysis 86 6.1.1. Argumentative discourse units 86 6.1.2. Conclusions and premises 88 6.1.3. Warrants and backings 89 6.1.4. Qualifiers 89 6.1.5. Argument schemes 90 6.1.6. Attack relations: rebuttals, refutations, undercutters 90 6.1.7. Illocutionary forces, speech acts 92 6.1.8. Argument relations 93 6.1.9. Implicit argument components and tailored annotation frameworks 95 6.2. Examples of argument analysis frameworks 97 6.2.1. Rhetorical Structure Theory 97 6.2.2. Toulmin’s model 98 6.2.3. Inference Anchoring Theory 99 6.2.4. Summary 102 6.3. Guidelines for argument analysis 103 6.3.1. Principles of annotation guidelines 103 6.3.2. Inter-annotator agreements 104 6.3.3. Interpretation of IAA measures 105 6.3.4. Some examples of IAAs 106 6.3.5. Summary 107 6.4. Annotation tools 108 6.4.1. Brat 108 6.4.2. RST tool 109 6.4.3. AGORA-net 110 6.4.4. Araucaria 110 6.4.5. Rationale 111 6.4.6. OVA+ 112 6.4.7. Summary 113 6.5. Argument corpora 114 6.5.1. COMARG 115 6.5.2. A news editorial corpus 115 6.5.3. THF Airport ArgMining corpus 115 6.5.4. A Wikipedia articles corpus 115 6.5.5. AraucariaDB 115 6.5.6. An annotated essays corpus 116 6.5.7. A written dialogs corpus 116 6.5.8. A web discourse corpus 116 6.5.9. Argument Interchange Format Database 116 6.5.10. Summary 117 6.6. Conclusion 118 Chapter 7. Argument Mining Applications and Systems 119 7.1. Application domains for argument mining 119 7.1.1. Opinion analysis augmented by argument mining 120 7.1.2. Summarization 120 7.1.3. Essays 120 7.1.4. Dialogues 120 7.1.5. Scientific and news articles 120 7.1.6. The web 121 7.1.7. Legal field 121 7.1.8. Medical field 121 7.1.9. Education 121 7.2. Principles of argument mining systems 122 7.2.1. Argumentative discourse units detection 123 7.2.2. Units labeling 123 7.2.3. Argument structure detection 124 7.2.4. Argument completion 125 7.2.5. Argument structure representation 125 7.3. Some existing systems for argument mining 126 7.3.1. Automatic detection of rhetorical relations 126 7.3.2. Argument zoning 126 7.3.3. Stance detection 127 7.3.4. Argument mining for persuasive essays 127 7.3.5. Argument mining for web discourse 127 7.3.6. Argument mining for social media 128 7.3.7. Argument scheme classification and enthymemes reconstruction 128 7.3.8. Argument classes and argument strength classification 128 7.3.9. Textcoop 129 7.3.10. IBM debating technologies 129 7.3.11. Argument mining for legal texts 129 7.4. Efficiency and limitations of existing argument mining systems 130 7.5. Conclusion 131 Chapter 8. A Computational Model and a Simple Grammar-Based Implementation 133 8.1. Identification of argumentative units 134 8.1.1. Challenges raised by the identification of argumentative units 134 8.1.2. Some linguistic techniques to identify ADUs 135 8.2. Mining for claims 139 8.2.1. The grammar formalisms 140 8.2.2. Lexical issues 142 8.2.3. Grammatical issues 145 8.2.4. Templates for claim analysis 148 8.3. Mining for supports and attacks 150 8.3.1. Structures introduced by connectors 150 8.3.2. Structures introduced by propositional attitudes 151 8.3.3. Other linguistic forms to express supports or attacks 152 8.4. Evaluating strength 153 8.5. Epilogue 154 Chapter 9. Non-Verbal Dimensions of Argumentation: a Challenge for Argument Mining 155 9.1. The text and its additions 156 9.1.1. Text, pictures and icons 156 9.1.2. Transcriptions of oral debates 156 9.2. Argumentation and visual aspects 157 9.3. Argumentation and sound aspects 158 9.3.1. Music and rationality 159 9.3.2. Main features of musical structure: musical knowledge representation 160 9.4. Impact of non-verbal aspects on argument strength and on argument schemes 161 9.5. Ethical aspects 162 Bibliography 163 Index 175
£125.06
ISTE Ltd and John Wiley & Sons Inc Sharing Economy and Big Data Analytics
Book SynopsisThe different facets of the sharing economy offer numerous opportunities for businesses ? particularly those that can be distinguished by their creative ideas and their ability to easily connect buyers and senders of goods and services via digital platforms. At the beginning of the growth of this economy, the advanced digital technologies generated billions of bytes of data that constitute what we call Big Data. This book underlines the facilitating role of Big Data analytics, explaining why and how data analysis algorithms can be integrated operationally, in order to extract value and to improve the practices of the sharing economy. It examines the reasons why these new techniques are necessary for businesses of this economy and proposes a series of useful applications that illustrate the use of data in the sharing ecosystem.Table of ContentsPreface xi Introduction xiii Part 1. The Sharing Economy or the Emergence of a New Business Model 1 Chapter 1. The Sharing Economy: A Concept Under Construction 3 1.1. Introduction 3 1.2. From simple sharing to the sharing economy 5 1.2.1. The genesis of the sharing economy and the break with “consumer” society 5 1.2.2. The sharing economy: which economy? 8 1.3. The foundations of the sharing economy 10 1.3.1. Peer-to-peer (P2P): a revolution in computer networks 10 1.3.2. The gift: the abstract aspect of the sharing economy 13 1.3.3. The service economy and the offer of use 18 1.4. Conclusion 24 Chapter 2. An Opportunity for the Business World 25 2.1. Introduction 25 2.2. Prosumption: a new sharing economy trend for the consumer 27 2.3. Poverty: a target in the spotlight of the shared economy 29 2.4. Controversies on economic opportunities of the sharing economy 31 2.5. Conclusion 37 Chapter 3. Risks and Issues of the Sharing Economy 39 3.1. Introduction 39 3.2. Uberization: a white grain or just a summer breeze? 40 3.3. The sharing economy: a disruptive model 43 3.4. Major issues of the sharing economy 47 3.5. Conclusion 50 Chapter 4. Digital Platforms and the Sharing Mechanism 51 4.1. Introduction 51 4.2. Digital platforms: “What growth!” 52 4.3. Digital platforms or technology at the service of the economy 54 4.4. From the sharing economy to the sharing platform economy 57 4.5. Conclusion 59 Part 2. Big Data Analytics at the Service of the Sharing Economy 61 Chapter 5. Beyond the Word “Big”: The Changes 63 5.1. Introduction 63 5.2. The 3 Vs and much more: volume, variety, velocity 64 5.2.1. Volume 65 5.2.2. The variety 66 5.2.3. Velocity 67 5.2.4. What else? 68 5.3. The growth of computing and storage capacities 69 5.3.1. Big Data versus Big Computing 70 5.3.2. Big Data storage 71 5.3.3. Updating Moore’s Law 73 5.4. Business context change in the era of Big Data 74 5.4.1. The decision-making process and the dynamics of value creation 75 5.4.2. The emergence of new data-driven business models 77 5.5. Conclusion 78 Chapter 6. The Art of Analytics 81 6.1. Introduction 81 6.2. From simple analysis to Big Data analytics 82 6.2.1. Descriptive analysis: learning from past behavior to influence future outcomes 84 6.2.2. Predictive analysis: analyzing data to predict future outcomes 84 6.2.3. Prescriptive analysis: recommending one or more action plan(s) 85 6.2.4. From descriptive analysis to prescriptive analysis: an example 87 6.3. The process of Big Data analytics: from the data source to its analysis 88 6.3.1. Definition of objectives and requirements 90 6.3.2. Data collection 91 6.3.3. Data preparation 92 6.3.4. Exploration and interpretation 94 6.3.5. Modeling 95 6.3.6. Deployment 97 6.4. Conclusion 97 Chapter 7. Data and Platforms in the Sharing Context 99 7.1. Introduction 99 7.2. Pioneers in Big Data 101 7.2.1. Big Data on Walmart’s shelves 101 7.2.2. The Big Data behind Netflix’s success story 102 7.2.3. The Amazon version of Big Data 103 7.2.4. Big data and social networks: the case of Facebook 104 7.2.5. IBM and data analysis in the health sector 105 7.3. Data, essential for sharing 106 7.3.1. Data and platforms at the heart of the sharing economy 108 7.3.2. The data of sharing economy companies 110 7.3.3. Privacy and data security in a sharing economy 111 7.3.4. Open Data and platform data sharing 114 7.4. Conclusion 116 Chapter 8. Big Data Analytics Applied to the Sharing Economy 119 8.1. Introduction 119 8.2. Big Data and Machine Learning algorithms serving the sharing economy 121 8.2.1. Machine Learning algorithms 122 8.2.2. Algorithmic applications in the sharing economy context 124 8.3. Big Data technologies: the sharing economy companies’ toolbox 125 8.3.1. The appearance of a new concept and the creation of new technologies 127 8.4. Big Data on the agenda of sharing economy companies 130 8.4.1. Uber 131 8.4.2. Airbnb 132 8.4.3. BlaBlaCar 133 8.4.4. Lyft 134 8.4.5. Yelp 135 8.4.6. Other cases 137 8.5. Conclusion 139 Part 3. The Sharing Economy? Not Without Big Data Algorithms 141 Chapter 9. Linear Regression 143 9.1. Introduction 143 9.2. Linear regression: an advanced analysis algorithm 144 9.2.1. How are regression problems identified? 145 9.2.2. The linear regression model 146 9.2.3. Minimizing modeling error 148 9.3. Other regression methods 149 9.3.1. Logistic regression 150 9.3.2. Additional regression models: regularized regression 151 9.4. Building your first predictive model: a use case 152 9.4.1. What variables help set a rental price on Airbnb? 152 9.5. Conclusion 169 Chapter 10. Classification Algorithms 171 10.1. Introduction 171 10.2. A tour of classification algorithms 172 10.2.1. Decision trees 172 10.2.2. Naïve Bayes 175 10.2.3. Support Vector Machine (SVM) 177 10.2.4. Other classification algorithms 179 10.3. Modeling Airbnb prices with classification algorithms 183 10.3.1. The work that’s already been done: overview 184 10.3.2. Models based on trees: decision tree versus Random Forest 185 10.3.3. Price prediction with kNN 190 10.4. Conclusion 193 Chapter 11. Cluster Analysis 195 11.1. Introduction 195 11.2. Cluster analysis: general framework 196 11.2.1. Cluster analysis applications 197 11.2.2. The clustering algorithm and the similarity measure 198 11.3. Grouping similar objects using k-means 200 11.3.1. The k-means algorithm 201 11.3.2. Determine the number of clusters 203 11.4. Hierarchical classification 205 11.4.1. The hierarchical model approach 206 11.4.2. Dendrograms 207 11.5. Discovering hidden structures with clustering algorithms 208 11.5.1. Illustration of the classification of prices based on different characteristics using the k-means algorithm 209 11.5.2. Identify the number of clusters k 210 11.6. Conclusion 213 Conclusion 215 References 217 Index 233
£125.06
ISTE Ltd and John Wiley & Sons Inc Perceptions and Analysis of Digital Risks
Book SynopsisThe concept of digital risk, which has become ubiquitous in the media, sustains a number of myths and beliefs about the digital world. This book explores the opposite view of these ideologies by focusing on digital risks as perceived by actors in their respective contexts.Perceptions and Analysis of Digital Risks identifies the different types of risks that concern actors and actually impact their daily lives, within education or various socio-professional environments. It provides an analysis of the strategies used by the latter to deal with these risks as they conduct their activities; thus making it possible to characterize the digital cultures and, more broadly, the informational cultures at work.This book offers many avenues for action in terms of educating the younger generations, training teachers and leaders, and mediating risks.Table of ContentsForeword xiFranc MORANDI Introduction xviiCamille CAPELLE Part 1. Risk Perceptions, Education and Learning 1 Chapter 1. Digital Risks: An Obstacle or a Lever for Education? 3Camille CAPELLE 1.1. Introduction 3 1.2. Digital risks and education: what are we talking about? 4 1.2.1. Digital risks 4 1.2.2. What are the risks in education? 8 1.3. Questioning perceptions of digital risks among new teachers 9 1.3.1. Why was this target audience chosen? 9 1.3.2. Methodology and data collection 10 1.4. Teachers’ perceptions of digital risks 11 1.4.1. When perceptions of risk inhibit any practice 11 1.4.2. When perceptions of risk freeze practices 14 1.4.3. When risk perceptions lead us to consider them in order to overcome them 18 1.5. Reflection on the role of digital risk representations in education 21 1.6. Conclusion 24 1.7. References 25 Chapter 2. Teenagers Faced with “Fake News”: Perceptions and the Evaluation of an Epistemic Risk 27Gilles SAHUT and Sylvie FRANCISCO 2.1. Introduction 27 2.2. Fake news: From production to reception 28 2.2.1. Characterizing the fake news phenomenon 29 2.2.2. The potential risks associated with fake news 31 2.2.3. The credibility of fake news 32 2.3. Methodological framework of the study 34 2.4. Results of the study 36 2.4.1. A heterogeneous understanding of the concept 37 2.4.2. A blurred perception of the goals of fake news 39 2.4.3. The diversity of fake news sources 40 2.4.4. Identifying fake news: heuristic processing and analytical strategies 42 2.4.5. A remote and controlled phenomenon? 45 2.5. Discussion of the results and reflections on media and information literacy 46 2.6. Conclusion 49 2.7. References 50 Chapter 3. “A Big Nebula that is a Bit Scary” (Louise, Trainee Schoolteacher): Training through/in Digital Technology, in School and in Professional Training 55Anne CORDIER 3.1. Social beings, above all else 57 3.1.1. A “fluid identity” to be grasped 57 3.1.2. Digital technology in the actors’ personal ecosystem 61 3.2. Understanding of digital technology in the classroom 62 3.2.1. Crystallization and awareness of issues 62 3.2.2. When the socio-technical framework hinders the entry of digital technology into the classroom 64 3.2.3. Rather modest and low-risk experiments 66 3.3. Teaching with and through digital technology: Constant risks 68 3.3.1. Tensions in the classroom 68 3.3.2. Tensions in training 71 3.3.3. Desires on both sides 73 3.4. Potential courses of action 76 3.5. References 78 Part 2. Risks in the Light of Socio-Economic Issues 81 Chapter 4. Top Managers Confronted with Information Risks: An Exploratory Study within the Telecommunications Sector 83Dijana LEKIC, Anna LEZON-RIVIÈRE and Madjid IHADJADENE 4.1. Introduction 83 4.2. Information risk: The conceptual field 84 4.3. Controlling information risks: Security policy 89 4.4. Information risk and management 91 4.5. Study methodology and the stakeholder group 93 4.6. Information risk: The perspective of top telecoms managers 94 4.6.1. Top managers as responsible for information risk management 94 4.6.2. Information risk management 97 4.6.3. Operational challenges related to the information risk management approach 100 4.7. Conclusion 104 4.8 Acknowledgments 106 4.9. References 106 Chapter 5. Cell Phones and Scamming Risks in Cameroon: Users’ Experiences and Socio-Institutional Responses 111Freddy TSOPFACK FOFACK and Abdel Bernazi RENGOU 5.1. Introduction 111 5.2. Mechanisms behind cell phone scamming in Cameroon: Exhibiting credulity 115 5.2.1. Setting the scene 116 5.2.2. Enticing but misleading proposals 117 5.2.3. Disguised telephone number confusion 119 5.3. The dynamics of cell phone use in Cameroon 121 5.3.1. The Ministry of Posts and Telecommunications 121 5.3.2. Agence Nationale des Technologies de l’Information et de la Communication 122 5.3.3. Agence de Régulation des Télécommunications 122 5.3.4. Cell phone operators 123 5.3.5. The judicial system and cell phone scams 124 5.3.6. Cell phone users and consumer associations 125 5.4. Socio-institutional governance of cell phone use in Cameroon: Optimal or approximate mediations? 126 5.4.1. Information deficit of the users 126 5.4.2. Insufficient means of action 127 5.4.3. Mis-selling of SIM cards by mobile operators: An “ingredient” of mobile scammers 128 5.4.4. The ease of monetary transactions 129 5.4.5. Technological constraints and border porosity 129 5.5. Conclusion 130 5.6. References 131 Part 3. Digital Risks: Practices and Mediation 135 Chapter 6. Towards a Normative Prescription of Information Practices on Digital Social Networks: A Study of Documentary Pedagogical Projects in Middle School 137Adeline ENTRAYGUES 6.1. Introduction 137 6.2. Contextualization of risk 138 6.3. Issues to consider 138 6.4. Research objects 139 6.5. Research protocol 142 6.6. Risk regarding DSNs in the pedagogical approach 144 6.6.1. Raising awareness of risks: An obvious approach for teacher librarians 144 6.6.2. Considering the views of learners and teachers 145 6.6.3. Considering the risks: Learners aware of digital dangers 148 6.7. Discovering DSNs in a school context: Dealing with risks 151 6.7.1. Pedagogical projects on DSNs to prevent risks: Teachers’ perspectives 151 6.7.2. Overcoming risks: Learners’ perspectives 152 6.8. Perspectives for an information culture 153 6.8.1. Risks, standards and education 153 6.8.2. A culture of information in training 154 6.9. Conclusion 155 6.10. References 155 Chapter 7. MIL as a Tool for Teachers to Prevent Risk and Transmit Digital Culture 159Julie PASCAU 7.1. Studying digital technology in schools from the perspective of teachers’ representations 159 7.1.1. Why be interested in representations? 161 7.1.2. The social representation of digital risks through the analysis of institutional discourses 163 7.2. What do digital and media literacy evoke in teachers? 164 7.2.1. The weak presence of digital technology and MIL in elementary school 165 7.2.2. Risks in the representations of MIL among primary school teachers 166 7.2.3. A positive perception of the role of digital technology in the classroom 169 7.3. The contours of media and information literacy according to teachers 171 7.3.1. The objects of MIL from the discourse of primary school teachers 172 7.3.2. What does digital technology mean for teachers? 173 7.4. What does the requirement to transmit digital culture mean for teachers? 178 7.4.1. Digital culture: A very vague concept 178 7.4.2. What primary school teachers think digital literacy means 180 7.5. Conclusion 187 7.6. References 189 Conclusion 193Camille CAPELLE Postface 197Vincent LIQUÈTE List of Authors 201 Index 203
£124.15
Edward Elgar Publishing Ltd Digital Transformations: New Tools and Methods
Book SynopsisTechnology is not just limited to technology companies, it impacts sectors such as healthcare, agriculture, and security. In the last few decades, countries, too, have started developing technologies or integrating technologies into their systems. As a result, all countries, regardless of size, need to understand the management of engineering and technology concepts. Digital Transformations reviews fundamentals and applications through existing and emerging technologies all around the world.Big data availability and the emergence of new tools provide opportunities to detect the emergence of new technologies. Some of the major elements of such analyses include bibliometrics, patent analysis and social network analysis. The authors focus on these three tools and demonstrate their use through applications such as Blockchain, Artificial Intelligence, Robotics, 3D printing, Wireless Power, Autonomous and Electric Driving, and Smart Homes.Through the examination of cases based on emerging technologies, the book provides a spectrum of these recent applications and serves as a reference for professionals, researchers and students on fundamentals of technology utilization tools.Trade Review‘Dr. Tugrul Daim has championed another masterpiece with this manuscript. This book helps transform vital information from the academic world into a blueprint that can be used by government and industry commercial leaders. A true first of its kind, as there is no other manuscript available to help engineering and technology managers navigate through the challenges presented by Digital Transformations.’ -- Matthew L. Tompkins, TC Defense, US‘The authors introduce the use of statistical methods such as bibliometrics, patent analysis and network analysis to understand trends, connections and leadership in technology innovation and also to identify key issues. Real-world case studies explore an array of innovations in the medical, power, transportation and home appliance fields. This approach illustrates how the techniques are useful while telling the story of some of today’s pivotal innovations.’ -- Fred Gordon, Energy Trust of Oregon, USTable of ContentsContents: Introduction to Digital Transformations 1. Bibliometric-based analyses 2. Patent-based analyses 3. Network-based analyses 4. Integrated analyses 5. Conclusion References Index
£90.76
Business Expert Press Obtaining Value from Big Data for Service Systems, Volume I: Big Data Management
Book SynopsisVolume I of this two-volume series focuses on the role of big data in service delivery systems. It discusses the definition and orientation to big data, applications of it in service delivery systems, how to obtain results that can affect/enhance service delivery, and how to build an effective big data organization.This volume will assist readers in fitting big data analysis into their service-based organizations. It will also help readers understand how to improve the use of big data to enhance their service-oriented organizations.
£21.80