Data mining Books
Springer Data Science
Book Synopsis.- Infrastructure for Data Science..- Android malware detection method based on machine learning..- A lightweight edge network intrusion detection system based on MobileVit..- TRAFFICNET: A NOVEL NETWORK PERFORMANCE PREDICTION MODEL VIAAGGREGATOR-BASED ENHANCEMENT..- Social Media and Recommendation System..- Sentiment analysis for public opinion based on MapReduce and PSO-SVR..- Personalized Novel Recommendation System Based on Filtering and Sentiment Analysis..- Enhancing Relevance and Efficiency in Visual Question Generation throughRedundant Object Filtering..- Chinese Named Entity Recognition Algorithm integrating Vocabulary Information..- WSDSum:Unsupervised Extractive Summarization Based..- IPFS-DKRM: an efficient keyword retrieval model of IPFS based on ART..- Multimedia Data Management and Analysis..- Multi-Modal Variable-Channel Spatial-Temporal Semantic Action RecognitionNetwork..- Enhanced and pruned motion planning based on bird's-eye view..- CCU-NET: CBAM and Cascaded Edge Detection Optimization U-NET for RemoteSensing Image Segmentation..- Speech Emotion Recognition Using U-Net..- Non-Invasive Load Decomposition Model Based On Inception-SimAM-BiLSTM..- A Data-driven Coordinated Active And Reactive Dispatching Strategy For Photovoltaics..- PDTNet: An Image-Based Model for PV Panel Defect Detection..- SAMCNet:A Multi-Channel Face Anti-Spoofing Network Combined with HyperspectralImages via Self-Attention Mechanism..- Image Tampering Detection Method Based on Hybrid Attention Mechanism..- ZhouStage-zero A Dynamic Ensemble method for Intrusion Detection in IndustrialControl System..- High-precision Anime Conversion Model based on Generative Adversarial Networks..- Anomaly Segmentation in Foggy Weather for Autonomous Driving with AdaptiveLearnable Filters..- Image tampering localization based on dual-stream feature fusion..- Multi-scale Image Tampering Detection Using Inception-UNet Network..- Fetal Congenital Heart Disease Diagnosis Based On CBAM-Enhanced ResNet-50..- Transformers for Single Object Tracking: Temporal Context Propagation and FrameRelationship Modeling..- AFETY HELMET WEARING DETECTION BASED ON YOLOv7.
£76.49
Amazon Digital Services LLC - Kdp Microsoft office 365 pro
£999.99
GEORGETTE KELEMAOKALANI SQL a Python para análisis de datos
£22.50
HarperCollins Publishers Inc Everybody Lies
Book Synopsis
£13.01
Elsevier Science & Technology Analyzing Social Media Networks with NodeXL
Book SynopsisTable of ContentsPart I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social Media: New Technologies of Collaboration 3. Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections Part II. NodeXL Tutorial: Learning by Doing 4. Installation, Orientation, and Layout 5. Labeling and Visual Attributes 6. Calculating and Visualizing Network Metrics 7. Grouping and Filtering 8. Semantic Networks Part III. Social Media Network Analysis Case Studies 9. Email: The Lifeblood of Modern Communication 10. Thread Networks: Mapping Message Boards and Email Lists 11. Twitter: Information Flows, Influencers, and Organic Communities 12. Facebook: Public Pages and Inter-Organizational Networks 13. YouTube: Exploring Video Networks 14. Wiki Networks: Connections of Culture and Collaboration
£37.00
Pearson Education Data Analytics with Spark Using Python
a huge range and FREE tracked UK delivery on ALL orders.
£31.34
Taylor & Francis Ltd DiskBased Algorithms for Big Data
Book SynopsisDisk-Based Algorithms for Big Data is a product of recent advances in the areas of big data, data analytics, and the underlying file systems and data management algorithms used to support the storage and analysis of massive data collections. The book discusses hard disks and their impact on data management, since Hard Disk Drives continue to be common in large data clusters. It also explores ways to store and retrieve data though primary and secondary indices. This includes a review of different in-memory sorting and searching algorithms that build a foundation for more sophisticated on-disk approaches like mergesort, B-trees, and extendible hashing. Following this introduction, the book transitions to more recent topics, including advanced storage technologies like solid-state drives and holographic storage; peer-to-peer (P2P) communication; large file systems and query languages like Hadoop/HDFS, Hive, Cassandra, and Presto; and NoSQL databases like Neo4j for graph structurTable of ContentsForeword. Physical Disk Storage. File Management. Sorting. Searching. Disk-Based Sorting. Disk-Based Searching. Storage Technology. Large File Systems. NoSQL Storage. Appendix
£56.99
Taylor & Francis Inc A Users Guide to Business Analytics
Book SynopsisA User''s Guide to Business Analytics provides a comprehensive discussion of statistical methods useful to the business analyst. Methods are developed from a fairly basic level to accommodate readers who have limited training in the theory of statistics. A substantial number of case studies and numerical illustrations using the R-software package are provided for the benefit of motivated beginners who want to get a head start in analytics as well as for experts on the job who will benefit by using this text as a reference book.The book is comprised of 12 chapters. The first chapter focuses on business analytics, along with its emergence and application, and sets up a context for the whole book. The next three chapters introduce R and provide a comprehensive discussion on descriptive analytics, including numerical data summarization and visual analytics. Chapters five through seven discuss set theory, definitions and counting rules, probability, random Table of ContentsWhat Is Analytics? Introducing R—An Analytics Software. Reporting Data. Statistical Graphics and Visual Analytics. Probability. Random Variables and Probability Distributions. Continuous Random Variables. Statistical Inference. Regression for Predictive Model Building. Decision Trees. Data Mining and Multivariate Methods. Modeling Time Series Data for Forecasting.
£128.25
John Murray Press If Then: How One Data Company Invented the Future
Book SynopsisRadio 4's Book of the WeekA Financial Times Book of the YearShortlisted for the 2020 Financial Times / McKinsey Business Book of the YearLonglisted for the National Book Award 'The story of the original data science hucksters of the 1960s is hilarious, scathing and sobering - what you might get if you crossed Mad Men with Theranos' David RuncimanThe Simulmatics Corporation, founded in 1959, mined data, targeted voters, accelerated news, manipulated consumers, destabilized politics, and disordered knowledge--decades before Facebook, Google, Amazon, and Cambridge Analytica. Silicon Valley likes to imagine it has no past but the scientists of Simulmatics are the long-dead grandfathers of Mark Zuckerberg and Elon Musk. Borrowing from psychological warfare, they used computers to predict and direct human behavior, deploying their "People Machine" from New York, Cambridge, and Saigon for clients that included John Kennedy's presidential campaign, the New York Times, Young & Rubicam, and, during the Vietnam War, the Department of Defence. In If Then, distinguished Harvard historian and New Yorker staff writer, Jill Lepore, unearths from the archives the almost unbelievable story of this long-vanished corporation, and of the women hidden behind it. In the 1950s and 1960s, Lepore argues, Simulmatics invented the future by building the machine in which the world now finds itself trapped and tormented, algorithm by algorithm.'A person can't help but feel inspired by the riveting intelligence and joyful curiosity of Jill Lepore. Knowing that there is a mind like hers in the world is a hope-inducing thing' George Saunders, Man Booker Prize-winning author of Lincoln in the Bardo'An authoritative account of the origins of data science, a compelling political narrative of America in the Sixties, a poignant collective biography of a generation of flawed men' David Kynaston'If Then is simultaneously gripping and absolutely terrifying' Amanda ForemanTrade ReviewLepore is a brilliant writer. It's a dream to read. -- Diane CoyleIf you're looking for beautiful writing and love history ... this is a lovely read that takes you through a history of American politics and campaigning, cold war intrigue and artificial intelligence. * Financial Times *Jill Lepore is the pre-eminent historian of forgotten tales from America's past that throw startling light on the present. This brilliant book illuminates the future too. The story of the original data science hucksters of the 1960s is hilarious, scathing and sobering - what you might get if you crossed Mad Men with Theranos. -- David RuncimanFascinating. * New York Times Book Review *A person can't help but feel inspired by the riveting intelligence and joyful curiosity of Jill Lepore. Knowing that there is a mind like hers in the world is a hope-inducing thing. -- George SaundersJill Lepore writes history like a poet. In If Then she yet again binds lyrical story telling to meticulous archival research to tell a gigantic story from our past. She builds our present, and makes it feel so familiar and yet so contingent. -- Dan SnowTwo things make this tale worth reading. One is Lepore's brisk and confident depiction of the individuals involved...the other is her exploration of the growing power of computers to accumulate and analyse data, bringing marketing and politics into ever closer union. -- Frances Cairncross * The Literary Review *Beautifully written and intellectually rigorous account of the origins of the science of predictive analytics and behavioral data science in the cold war era. * Financial Times *Fascinating. -- Amol Rajan * Start the Week *Everything Lepore writes is distinguished by intelligence, eloquence, and fresh insight. If Then is that, and even more: It's absolutely fascinating, excavating a piece of little-known American corporate history that reveals a huge amount about the way we live today and the companies that define the modern era. -- Susan OrleanA wonderfully written history of long-forgotten computer group Simulmatics. * Financial Times *
£18.00
Springer International Publishing AG Feedback Control Systems: The MATLAB®/Simulink® Approach
Book SynopsisFeedback control systems is an important course in aerospace engineering, chemical engineering, electrical engineering, mechanical engineering, and mechatronics engineering, to name just a few. Feedback control systems improve the system's behavior so the desired response can be acheived. The first course on control engineering deals with Continuous Time (CT) Linear Time Invariant (LTI) systems. Plenty of good textbooks on the subject are available on the market, so there is no need to add one more. This book does not focus on the control engineering theories as it is assumed that the reader is familiar with them, i.e., took/takes a course on control engineering, and now wants to learn the applications of MATLAB® in control engineering. The focus of this book is control engineering applications of MATLAB® for a first course on control engineering.Table of ContentsPreface.- Acknowledgments.- Introduction to MATLAB®.- Commonly Used Commands in Analysis of Control Systems.- Introduction to Simulink®.- Controller Design in MATLAB®.- Introduction to System Identification Toolbox™.- References.- Authors' Biographies.
£62.99
Springer International Publishing AG Phrase Mining from Massive Text and Its
Book SynopsisA lot of digital ink has been spilled on "big data" over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications? In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.Table of ContentsAcknowledgments.- Introduction.- Quality Phrase Mining with User Guidance.- Automated Quality Phrase Mining.- Phrase Mining Applications.- Bibliography.- Authors' Biographies .
£26.59
Springer Machine Learning for Networking
Book Synopsis.- Learning per-flow SD-WAN load-balancing policies..- Survey on Federated Learning in Smart Healthcare..- Complex Communication Networks Management with Distributed AI:Challenges and Open Issues..- A Framework for Global Trust and Reputation Management in 6G Networks..- DRL Framework for Minimizing Beam Switching Time and Maintaining QoS in 6G-V2X Base Stations..- Reducing BLE energy loss in busy 2.4GHz band..- Leveraging SHAP to advance the Robustness of Large Language Models..- Keyword-Driven Email Classification: Leveraging Machine Learning Techniques..- Predicting Intents: ARMA-Based Modeling..- Design and Evaluation of a Lightweight SDN Controller for Integrated Road and Rail Networks..- PiPS: An effective strategy and approach for Privacy in Public Surveillance..- A comprehensive review of deep learning approaches for tomato leaf diseases detection and classification in smart agriculture..- A review on advancement in PEM Fuel cell Diagnosis based on Machinelearning techniques..- GPS Spoofing Attack against UAVs: a timeseries dataset case study.
£134.99
Springer Big Data Analytics and Knowledge Discovery
Book Synopsis.- Keynote Talk..- Sparse Matrix Algorithms for Evolving Neural Networks..- Invited Talk..- Data integration in the AI era: research trends and still open issues..- Tutorial..- Leveraging machine learning techniques for customer data deduplication-hard-won lessons from a real-world project in the financial industry,.- Data mining and knowledge discovery..- FairFES - Fast Exact Sampling for Fair Classification..- Autism Detection by Analyzing Handwriting Characteristics of Chinese Characters via Deep Learning Models..- FNoDe: Faulty Node Detection in Microservices Architecture..- An Enhanced FP-Growth Algorithm with Hybrid Adaptive Support Threshold for Association Rule Mining..- Sequential data analytics and recommendation systems..- Entity Resolution for Streaming Data with Embeddings..- Cross-Modal Sequential Point-of-Interest Recommendation with Lightweight Hybrid Fusion Strategy..- Alternatives to Shallow Autoencoders for Collaborative Filtering..- Accurate Concept Drift Detection without Updating Autoencoders..- Graph data processing and analytics..- Parallel and Distributed SQL/PGQ Query Processing for Property Graphs..- Graph Constraint Language for Industrial Knowledge Graphs and Machine Learning..- SemViSG : Semantic Enrichment and Visualization of Software Graphs..- Data management and Indices..- Certainty Attacks Using Explainability Preprocessing..- Integrating Bitcoin Transactions into Relational Databases for IoT: Challenges and Solutions..- Effects of Response Length on User Search Experience in Spoken Conversational Search..- Fair Proportional Top-k Ranking..- PAID: Power-efficient AI-optimized Databases..- On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data..- A Bayesian Reinforcement Learning Framework for Online Index Tuning..- Large language models (LLMs)..- Explaining Recovery Trajectories of Older Adults Post Lower-Limb Fracture Using Modality-wise Multiview Clustering and Large Language Models..- Parameter Drift as a Signal for Membership Inference in Overfit-Tuned LLMs..- MicroSuggest: Kernel-Aware Microservice Decomposition..- TraceTune: Targeted Fine-Tuning of Attention Heads for Text-to-SQL..- Neural networks..- ONNYX : Optimized Neural Networks Yielding eXplainable insights from ECG signals-based data streams..- SpaPool: Soft Partition Assignment Pooling for Graph Neural Networks..- Prediction of iterative solvers' convergence using pretraining by natural images..- Local-aware Convolutional Modulation for Short-Term Sequential Recommendation.
£58.49
de Gruyter Oldenbourg Das Datenzentrische Unternehmen
Book Synopsis
£32.36
Springer International Publishing AG State of the Art Applications of Social Network Analysis
Book SynopsisSocial network analysis increasingly bridges the discovery of patterns in diverse areas of study as more data becomes available and complex. Yet the construction of huge networks from large data often requires entirely different approaches for analysis including; graph theory, statistics, machine learning and data mining. This work covers frontier studies on social network analysis and mining from different perspectives such as social network sites, financial data, e-mails, forums, academic research funds, XML technology, blog content, community detection and clique finding, prediction of user’s- behavior, privacy in social network analysis, mobility from spatio-temporal point of view, agent technology and political parties in parliament. These topics will be of interest to researchers and practitioners from different disciplines including, but not limited to, social sciences and engineering.Table of ContentsA Randomized Approach for Structural and Message based Private Friend Recommendation in Online Social Networks; B. K. Samanthula, W.Jiang.- Context Based Semantic Relations in Tweets; O. Ozdikis et al.- Fast exact and approximate computation of betweenness centrality in social networks; M. Baglioni et al.- Network Simulation; E. Franchi.- Early Stage Conversation Catalysts on Entertainment-Based Web Forums; J. Lanagan et al.- Predicting Users Behaviours in Distributed Social Networks Using Community Analysis ; B. Ngonmang et al.- What should we protect? Defining differential privacy for social network analysis; C. Task, C.Clifton.- Complex Network Analysis of Research Funding: A Case Study of NSF Grants; H. Kardes et al.- Community Evolutionary Events in Online Social Networks; M. Abulaish, S. Yousuf Bhat.-@Rank: Personalized Centrality Measure for Email Communication Networks; P. Lubarski, M. Morzy.-Twitter Sentiment Analysis: How To Hedge Your Bets In The Stock Markets; T.Rao, S. Srivastava.- The Impact of Measurement Time on Subgroup Detection in Online Communities; S. Zeini et al.- Spatial and Temporal Evaluation of Network-Based Analysis of Human Mobility; M. Coscia et al.- An Ant based Particle Swarm Optimization Algorithm for Maximum Clique Problem in Social networks; M. Soleimani-pouri et al.- XEngine: An XML Search Engine for Social Groups; K.Taha.- Size, diversity and components in the network around an entrepreneur: Shaped by culture and shaping embeddedness of firm relations; M. Cheraghi, T.Schott .- Content Mining of Microblogs; M.Ö. Cingiz, B. Diri.
£42.74
Springer International Publishing AG Advances in Big Data: Proceedings of the 2nd INNS
Book SynopsisThe book offers a timely snapshot of neural network technologies as a significant component of big data analytics platforms. It promotes new advances and research directions in efficient and innovative algorithmic approaches to analyzing big data (e.g. deep networks, nature-inspired and brain-inspired algorithms); implementations on different computing platforms (e.g. neuromorphic, graphics processing units (GPUs), clouds, clusters); and big data analytics applications to solve real-world problems (e.g. weather prediction, transportation, energy management). The book, which reports on the second edition of the INNS Conference on Big Data, held on October 23–25, 2016, in Thessaloniki, Greece, depicts an interesting collaborative adventure of neural networks with big data and other learning technologies.Table of ContentsPredicting human behavior based on web search activity: Greek referendum of 2015.- Compact Video Description and Representation for Automated Summarization of Human Activities.- Attribute Learning for Network Intrusion Detection.- A Fast Deep Convolutional Neural Network for face detection in Big Visual Data.- Learning Symbols by Neural Network.- Designing HMMs models in the age of Big Data.- Extended Formulations for Online Action Selection on Big Action Sets.- Multi-Task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports.- An infrastructure and approach for infering knowledge over Big Data in the Vehicle Insurance Industry.- Unified Retrieval Model of Big Data.- Adaptive Elitist Differential Evolution Extreme Learning Machines on Big Data: Intelligent Recognition of Invasive Species.
£116.99
Springer International Publishing AG Network Intelligence Meets User Centered Social
Book SynopsisThis edited volume presents advances in modeling and computational analysis techniques related to networks and online communities. It contains the best papers of notable scientists from the 4th European Network Intelligence Conference (ENIC 2017) that have been peer reviewed and expanded into the present format. The aim of this text is to share knowledge and experience as well as to present recent advances in the field. The book is a nice mix of basic research topics such as data-based centrality measures along with intriguing applied topics, for example, interaction decay patterns in online social communities. This book will appeal to students, professors, and researchers working in the fields of data science, computational social science, and social network analysis. Table of ContentsData-based centrality measures.- Extracting the Main Path of historic events from Wikipedia.- Simulating trade in economic networks with TrEcSim.- Community Aliveness: Discovering interaction decay patterns in online social communities.- Network Patterns of Direct and Indirect Reciprocity in edX MOOC Forums.- Targeting influential nodes for recovery in bootstrap percolation on hyperbolic networks.- Trump versus Clinton – Twitter communication during the US primaries.- Extended feature-driven graph model for Social Media Networks.- Market basket analysis using minimum spanning trees.- Behavior-based relevance estimation for social networks interaction relations.- Sponge walker: Community detection in large directed social networks using local structures and random walks.- Identifying promising research topics in Computer Science.- Identifying accelerators of information diffusion across social media channels .- Towards an ILP approach for learning privacy heuristics from users' regrets.- Strength of nations: A case study on estimating the influence of leading countries using social media analysis.- Incremental learning in dynamic networks for node classification.
£33.74
Springer-Verlag Berlin and Heidelberg GmbH & Co. KG Advances in Data Mining: Applications and
Book SynopsisThis book constitutes the refereed proceedings of the 13th Industrial Conference on Data Mining, ICDM 2013, held in New York, NY, in July 2013. The 22 revised full papers presented were carefully reviewed and selected from 112 submissions. The topics range from theoretical aspects of data mining to applications of data mining, such as in multimedia data, in marketing, finance and telecommunication, in medicine and agriculture, and in process control, industry and society.Table of ContentsTheoretical aspects of data mining; applications of data mining in multimedia data.- Applications of data mining in marketing and in finance.- Applications of data mining in telecommunication.- Applications of data mining in medicine and agriculture.- Applications of data mining in process control, industry and society.
£39.99
Springer-Verlag New York Inc. Machine Learning in Cyber Trust
Book SynopsisCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.Trade ReviewFrom the reviews: "This is a useful book on machine learning for cyber security applications. It will be helpful to researchers and graduate students who are looking for an introduction to a specific topic in the field. All of the topics covered are well researched. The book consists of 12 chapters, grouped into four parts." (Imad H. Elhajj, ACM Computing Reviews, October, 2009)Table of ContentsCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.
£125.99
John Wiley & Sons Inc Unsupervised Learning
Book SynopsisA new approach to unsupervised learning Evolving technologies have brought about an explosion of information in recent years, but the question of how such information might be effectively harvested, archived, and analyzed remains a monumental challengefor the processing of such information is often fraught with the need for conceptual interpretation: a relatively simple task for humans, yet an arduous one for computers. Inspired by the relative success of existing popular research on self-organizing neural networks for data clustering and feature extraction, Unsupervised Learning: A Dynamic Approach presents information within the family of generative, self-organizing maps, such as the self-organizing tree map (SOTM) and the more advanced self-organizing hierarchical variance map (SOHVM). It covers a series of pertinent, real-world applications with regard to the processing of multimedia datafrom its role in generic image processing techniques, such as thTable of ContentsAcknowledgments xi 1 Introduction 1 1.1 Part I: The Self-Organizing Method 1 1.2 Part II: Dynamic Self-Organization for Image Filtering and Multimedia Retrieval 2 1.3 Part III: Dynamic Self-Organization for Image Segmentation and Visualization 5 1.4 Future Directions 7 2 Unsupervised Learning 9 2.1 Introduction 9 2.2 Unsupervised Clustering 9 2.3 Distance Metrics for Unsupervised Clustering 11 2.4 Unsupervised Learning Approaches 13 2.4.1 Partitioning and Cluster Membership 13 2.4.2 Iterative Mean-Squared Error Approaches 15 2.4.3 Mixture Decomposition Approaches 17 2.4.4 Agglomerative Hierarchical Approaches 18 2.4.5 Graph-Theoretic Approaches 20 2.4.6 Evolutionary Approaches 20 2.4.7 Neural Network Approaches 21 2.5 Assessing Cluster Quality and Validity 21 2.5.1 Cost Function–Based Cluster Validity Indices 22 2.5.2 Density-Based Cluster Validity Indices 23 2.5.3 Geometric-Based Cluster Validity Indices 24 3 Self-Organization 27 3.1 Introduction 27 3.2 Principles of Self-Organization 27 3.2.1 Synaptic Self-Amplification and Competition 27 3.2.2 Cooperation 28 3.2.3 Knowledge Through Redundancy 29 3.3 Fundamental Architectures 29 3.3.1 Adaptive Resonance Theory 29 3.3.2 Self-Organizing Map 37 3.4 Other Fixed Architectures for Self-Organization 43 3.4.1 Neural Gas 44 3.4.2 Hierarchical Feature Map 45 3.5 Emerging Architectures for Self-Organization 46 3.5.1 Dynamic Hierarchical Architectures 47 3.5.2 Nonstationary Architectures 48 3.5.3 Hybrid Architectures 50 3.6 Conclusion 50 4 Self-Organizing Tree Map 53 4.1 Introduction 53 4.2 Architecture 54 4.3 Competitive Learning 55 4.4 Algorithm 57 4.5 Evolution 61 4.5.1 Dynamic Topology 61 4.5.2 Classification Capability 64 4.6 Practical Considerations, Extensions, and Refinements 68 4.6.1 The Hierarchical Control Function 68 4.6.2 Learning, Timing, and Convergence 71 4.6.3 Feature Normalization 73 4.6.4 Stop Criteria 73 4.7 Conclusions 74 5 Self-Organization in Impulse Noise Removal 75 5.1 Introduction 75 5.2 Review of Traditional Median-Type Filters 76 5.3 The Noise-Exclusive Adaptive Filtering 82 5.3.1 Feature Selection and Impulse Detection 82 5.3.2 Noise Removal Filters 84 5.4 Experimental Results 86 5.5 Detection-Guided Restoration and Real-Time Processing 99 5.5.1 Introduction 99 5.5.2 Iterative Filtering 101 5.5.3 Recursive Filtering 104 5.5.4 Real-Time Processing of Impulse Corrupted TV Pictures 105 5.5.5 Analysis of the Processing Time 109 5.6 Conclusions 115 6 Self-Organization in Image Retrieval 119 6.1 Retrieval of Visual Information 120 6.2 Visual Feature Descriptor 122 6.2.1 Color Histogram and Color Moment Descriptors 122 6.2.2 Wavelet Moment and Gabor Texture Descriptors 123 6.2.3 Fourier and Moment-based Shape Descriptors 125 6.2.4 Feature Normalization and Selection 127 6.3 User-Assisted Retrieval 130 6.3.1 Radial Basis Function Method 132 6.4 Self-Organization for Pseudo Relevance Feedback 136 6.5 Directed Self-Organization 140 6.5.1 Algorithm 142 6.6 Optimizing Self-Organization for Retrieval 146 6.6.1 Genetic Principles 147 6.6.2 System Architecture 149 6.6.3 Genetic Algorithm for Feature Weight Detection 150 6.7 Retrieval Performance 153 6.7.1 Directed Self-Organization 153 6.7.2 Genetic Algorithm Weight Detection 155 6.8 Summary 157 7 The Self-Organizing Hierarchical Variance Map 159 7.1 An Intuitive Basis 160 7.2 Model Formulation and Breakdown 162 7.2.1 Topology Extraction via Competitive Hebbian Learning 163 7.2.2 Local Variance via Hebbian Maximal Eigenfilters 165 7.2.3 Global and Local Variance Interplay for Map Growth and Termination 170 7.3 Algorithm 173 7.3.1 Initialization, Continuation, and Presentation 173 7.3.2 Updating Network Parameters 175 7.3.3 Vigilance Evaluation and Map Growth 175 7.3.4 Topology Adaptation 176 7.3.5 Node Adaptation 177 7.3.6 Optional Tuning Stage 177 7.4 Simulations and Evaluation 177 7.4.1 Observations of Evolution and Partitioning 178 7.4.2 Visual Comparisons with Popular Mean-Squared Error Architectures 181 7.4.3 Visual Comparison Against Growing Neural Gas 183 7.4.4 Comparing Hierarchical with Tree-Based Methods 183 7.5 Tests on Self-Determination and the Optional Tuning Stage 187 7.6 Cluster Validity Analysis on Synthetic and UCI Data 187 7.6.1 Performance vs. Popular Clustering Methods 190 7.6.2 IRIS Dataset 192 7.6.3 WINE Dataset 195 7.7 Summary 195 8 Microbiological Image Analysis Using Self-Organization 197 8.1 Image Analysis in the Biosciences 197 8.1.1 Segmentation: The Common Denominator 198 8.1.2 Semi-supervised versus Unsupervised Analysis 199 8.1.3 Confocal Microscopy and Its Modalities 200 8.2 Image Analysis Tasks Considered 202 8.2.1 Visualising Chromosomes During Mitosis 202 8.2.2 Segmenting Heterogeneous Biofilms 204 8.3 Microbiological Image Segmentation 205 8.3.1 Effects of Feature Space Definition 207 8.3.2 Fixed Weighting of Feature Space 209 8.3.3 Dynamic Feature Fusion During Learning 213 8.4 Image Segmentation Using Hierarchical Self-Organization 215 8.4.1 Gray-Level Segmentation of Chromosomes 215 8.4.2 Automated Multilevel Thresholding of Biofilm 220 8.4.3 Multidimensional Feature Segmentation 221 8.5 Harvesting Topologies to Facilitate Visualization 226 8.5.1 Topology Aware Opacity and Gray-Level Assignment 227 8.5.2 Visualization of Chromosomes During Mitosis 228 8.6 Summary 233 9 Closing Remarks and Future Directions 237 9.1 Summary of Main Findings 237 9.1.1 Dynamic Self-Organization: Effective Models for Efficient Feature Space Parsing 237 9.1.2 Improved Stability, Integrity, and Efficiency 238 9.1.3 Adaptive Topologies Promote Consistency and Uncover Relationships 239 9.1.4 Online Selection of Class Number 239 9.1.5 Topologies Represent a Useful Backbone for Visualization or Analysis 240 9.2 Future Directions 240 9.2.1 Dynamic Navigation for Information Repositories 241 9.2.2 Interactive Knowledge-Assisted Visualization 243 9.2.3 Temporal Data Analysis Using Trajectories 245 Appendix A 249 A.1 Global and Local Consistency Error 249 References 251 Index 269
£100.76
John Wiley & Sons Inc Making Sense of Data III
Book SynopsisAs third in the series, this book focuses on a style of data analysis that makes graphics central to exploration. Making Sense of Data III explains how to implement decision support systems and provides an interactive approach to data analysis that allows users to see, manipulate, explore, mine data, and share results with colleagues.Trade Review“It is an essential book for understanding the principal role that graphics play in data visualization.” (Zentralblatt MATH, 1 April 2015) Table of ContentsPreface. 1. Introduction. 1.1 Overview. 1.2 Visual Perception. 1.3 Visualization. 1.4 Designing for High-throughput Data Exploration. 1.5 Summary. 1.6 Further reading. 2. The Cognitive and Visual Systems. 2.1 External Representation. 2.2 The Cognitive System. 2.3 Visual Perception. 2.4 Influencing Visual Perception. 2.5 Summary. 2.6 Further reading. 3. Graphic Representations. 3.1 Jacques Bertin: Semiology of Graphics. 3.2 Wilkinson: Grammar of Graphics. 3.3 Wickham: ggplot2. 3.4 Bostock and Heer: Protovis. 3.5 Summary. 3.6 Further reading. 4. Designing Visual Interactions. 4.1 Designing for Complexity. 4.2 The Process of Design. 4.3 Visual Interaction Design. 5. Hands-on: Creating Interactive Visualizations with Protovis. 5.1 Using Protovis. 5.2 Creating Code using the Protovis Graphical Framework. 5.3 Basic Protovis Marks. 5.4 Creating Customized Plots. 5.5 Creating Basic Plots. 5.6 Data Analysis Graphs. 5.7 Composite Plots. 5.8 Interactive Plots. 5.9 Protovis Summary. 5.10 Further Reading. Appendix. A Exercise Code Examples. Bibliography. Index.
£81.86
John Wiley & Sons Inc Data Mining Techniques
Book SynopsisThe leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business.Table of ContentsIntroduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 What Is Data Mining? 2 Data Mining Is a Business Process 2 Large Amounts of Data 3 Meaningful Patterns and Rules 3 Data Mining and Customer Relationship Management 4 Why Now? 6 Data Is Being Produced 6 Data Is Being Warehoused 6 Computing Power Is Affordable 7 Interest in Customer Relationship Management Is Strong 7 Commercial Data Mining Software Products Have Become Available 8 Skills for the Data Miner 9 The Virtuous Cycle of Data Mining 9 A Case Study in Business Data Mining 11 Identifying BofA’s Business Challenge 12 Applying Data Mining 12 Acting on the Results 13 Measuring the Effects of Data Mining 14 Steps of the Virtuous Cycle 15 Identify Business Opportunities 16 Transform Data into Information 17 Act on the Information 19 Measure the Results 20 Data Mining in the Context of the Virtuous Cycle 23 Lessons Learned 26 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Two Customer Lifecycles 27 The Customer’s Lifecycle 28 The Customer Lifecycle 28 Subscription Relationships versus Event-Based Relationships 30 Organize Business Processes Around the Customer Lifecycle 32 Customer Acquisition 33 Customer Activation 36 Customer Relationship Management 37 Winback 38 Data Mining Applications for Customer Acquisition 38 Identifying Good Prospects 39 Choosing a Communication Channel 39 Picking Appropriate Messages 40 A Data Mining Example: Choosing the Right Place to Advertise 40 Who Fits the Profile? 41 Measuring Fitness for Groups of Readers 44 Data Mining to Improve Direct Marketing Campaigns 45 Response Modeling 46 Optimizing Response for a Fixed Budget 47 Optimizing Campaign Profitability 49 Reaching the People Most Influenced by the Message 53 Using Current Customers to Learn About Prospects 54 Start Tracking Customers Before They Become “Customers” 55 Gather Information from New Customers 55 Acquisition-Time Variables Can Predict Future Outcomes 56 Data Mining Applications for Customer Relationship Management 56 Matching Campaigns to Customers 56 Reducing Exposure to Credit Risk 58 Determining Customer Value 59 Cross-selling, Up-selling, and Making Recommendations 60 Retention 60 Recognizing Attrition 60 Why Attrition Matters 61 Different Kinds of Attrition 62 Different Kinds of Attrition Model 63 Beyond the Customer Lifecycle 64 Lessons Learned 65 Chapter 3 The Data Mining Process 67 What Can Go Wrong? 68 Learning Things That Aren’t True 68 Learning Things That Are True, but Not Useful 73 Data Mining Styles 74 Hypothesis Testing 75 Directed Data Mining 81 Undirected Data Mining 81 Goals, Tasks, and Techniques 82 Data Mining Business Goals 82 Data Mining Tasks 83 Data Mining Techniques 88 Formulating Data Mining Problems: From Goals to Tasks to Techniques 88 What Techniques for Which Tasks? 95 Is There a Target or Targets? 96 What Is the Target Data Like? 96 What Is the Input Data Like? 96 How Important Is Ease of Use? 97 How Important Is Model Explicability? 97 Lessons Learned 98 Chapter 4 Statistics 101: What You Should Know About Data 101 Occam’s Razor 103 Skepticism and Simpson’s Paradox 103 The Null Hypothesis 104 P-Values 105 Looking At and Measuring Data 106 Categorical Values 106 Numeric Variables 117 A Couple More Statistical Ideas 120 Measuring Response 120 Standard Error of a Proportion 121 Comparing Results Using Confidence Bounds 123 Comparing Results Using Difference of Proportions 124 Size of Sample 125 What the Confidence Interval Really Means 126 Size of Test and Control for an Experiment 127 Multiple Comparisons 129 The Confidence Level with Multiple Comparisons 129 Bonferroni’s Correction 129 Chi-Square Test 130 Expected Values 130 Chi-Square Value 132 Comparison of Chi-Square to Difference of Proportions 134 An Example: Chi-Square for Regions and Starts 134 Case Study: Comparing Two Recommendation Systems with an A/B Test 138 First Metric: Participating Sessions 140 Data Mining and Statistics 144 Lessons Learned 148 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Directed Data Mining Models 152 Defining the Model Structure and Target 152 Incremental Response Modeling 154 Model Stability 156 Time-Frames in the Model Set 157 Directed Data Mining Methodology 159 Step 1: Translate the Business Problem into a Data Mining Problem 161 How Will Results Be Used? 163 How Will Results Be Delivered? 163 The Role of Domain Experts and Information Technology 164 Step 2: Select Appropriate Data 165 What Data Is Available? 166 How Much Data Is Enough? 167 How Much History Is Required? 167 How Many Variables? 168 What Must the Data Contain? 168 Step 3: Get to Know the Data 169 Examine Distributions 169 Compare Values with Descriptions 170 Validate Assumptions 170 Ask Lots of Questions 171 Step 4: Create a Model Set 172 Assembling Customer Signatures 172 Creating a Balanced Sample 172 Including Multiple Timeframes 174 Creating a Model Set for Prediction 174 Creating a Model Set for Profiling 176 Partitioning the Model Set 176 Step 5: Fix Problems with the Data 177 Categorical Variables with Too Many Values 177 Numeric Variables with Skewed Distributions and Outliers 178 Missing Values 178 Values with Meanings That Change over Time 179 Inconsistent Data Encoding 179 Step 6: Transform Data to Bring Information to the Surface 180 Step 7: Build Models 180 Step 8: Assess Models 180 Assessing Binary Response Models and Classifiers 181 Assessing Binary Response Models Using Lift 182 Assessing Binary Response Model Scores Using Lift Charts 184 Assessing Binary Response Model Scores Using Profitability Models 185 Assessing Binary Response Models Using ROC Charts 186 Assessing Estimators 188 Assessing Estimators Using Score Rankings 189 Step 9: Deploy Models 190 Practical Issues in Deploying Models 190 Optimizing Models for Deployment 191 Step 10: Assess Results 191 Step 11: Begin Again 193 Lessons Learned 193 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Similarity Models 196 Similarity and Distance 196 Example: A Similarity Model for Product Penetration 197 Table Lookup Models 203 Choosing Dimensions 204 Partitioning the Dimensions 205 From Training Data to Scores 205 Handling Sparse and Missing Data by Removing Dimensions 205 RFM: A Widely Used Lookup Model 206 RFM Cell Migration 207 RFM and the Test-and-Measure Methodology 208 RFM and Incremental Response Modeling 209 Naïve Bayesian Models 210 Some Ideas from Probability 210 The Naïve Bayesian Calculation 212 Comparison with Table Lookup Models 213 Linear Regression 213 The Best-fit Line 215 Goodness of Fit 217 Multiple Regression 220 The Equation 220 The Range of the Target Variable 221 Interpreting Coefficients of Linear Regression Equations 221 Capturing Local Effects with Linear Regression 223 Additional Considerations with Multiple Regression 224 Variable Selection for Multiple Regression 225 Logistic Regression 227 Modeling Binary Outcomes 227 The Logistic Function 229 Fixed Effects and Hierarchical Effects 231 Hierarchical Effects 232 Within and Between Effects 232 Fixed Effects 233 Lessons Learned 234 Chapter 7 Decision Trees 237 What Is a Decision Tree and How Is It Used? 238 A Typical Decision Tree 238 Using the Tree to Learn About Churn 240 Using the Tree to Learn About Data and Select Variables 241 Using the Tree to Produce Rankings 243 Using the Tree to Estimate Class Probabilities 243 Using the Tree to Classify Records 244 Using the Tree to Estimate Numeric Values 244 Decision Trees Are Local Models 245 Growing Decision Trees 247 Finding the Initial Split 248 Growing the Full Tree 251 Finding the Best Split 252 Gini (Population Diversity) as a Splitting Criterion 253 Entropy Reduction or Information Gain as a Splitting Criterion 254 Information Gain Ratio 256 Chi-Square Test as a Splitting Criterion 256 Incremental Response as a Splitting Criterion 258 Reduction in Variance as a Splitting Criterion for Numeric Targets 259 F Test 262 Pruning 262 The CART Pruning Algorithm 263 Pessimistic Pruning: The C5.0 Pruning Algorithm 267 Stability-Based Pruning 268 Extracting Rules from Trees 269 Decision Tree Variations 270 Multiway Splits 270 Splitting on More Than One Field at a Time 271 Creating Nonrectangular Boxes 271 Assessing the Quality of a Decision Tree 275 When Are Decision Trees Appropriate? 276 Case Study: Process Control in a Coffee Roasting Plant 277 Goals for the Simulator 277 Building a Roaster Simulation 278 Evaluation of the Roaster Simulation 278 Lessons Learned 279 Chapter 8 Artificial Neural Networks 281 A Bit of History 282 The Biological Model 283 The Biological Neuron 285 The Biological Input Layer 286 The Biological Output Layer 287 Neural Networks and Artificial Intelligence 287 Artificial Neural Networks 288 The Artificial Neuron 288 The Multi-Layer Perceptron 291 A Network Example 292 Network Topologies 293 A Sample Application: Real Estate Appraisal 295 Training Neural Networks 299 How Does a Neural Network Learn Using Back Propagation? 299 Pruning a Neural Network 300 Radial Basis Function Networks 303 Overview of RBF Networks 303 Choosing the Locations of the Radial Basis Functions 305 Universal Approximators 305 Neural Networks in Practice 308 Choosing the Training Set 309 Coverage of Values for All Features 309 Number of Features 310 Size of Training Set 310 Number and Range of Outputs 310 Rules of Thumb for Using MLPs 310 Preparing the Data 311 Interpreting the Output from a Neural Network 313 Neural Networks for Time Series 315 Time Series Modeling 315 A Neural Network Time Series Example 316 Can Neural Network Models Be Explained? 317 Sensitivity Analysis 318 Using Rules to Describe the Scores 318 Lessons Learned 319 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Memory-Based Reasoning 322 Look-Alike Models 323 Example: Using MBR to Estimate Rents in Tuxedo, New York 324 Challenges of MBR 327 Choosing a Balanced Set of Historical Records 328 Representing the Training Data 328 Determining the Distance Function, Combination Function, and Number of Neighbors 331 Case Study: Using MBR for Classifying Anomalies in Mammograms 331 The Business Problem: Identifying Abnormal Mammograms 332 Applying MBR to the Problem 332 The Total Solution 334 Measuring Distance and Similarity 335 What Is a Distance Function? 335 Building a Distance Function One Field at a Time 337 Distance Functions for Other Data Types 340 When a Distance Metric Already Exists 341 The Combination Function: Asking the Neighbors for Advice 342 The Simplest Approach: One Neighbor 342 The Basic Approach for Categorical Targets: Democracy 342 Weighted Voting for Categorical Targets 344 Numeric Targets 344 Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345 Why This Feat Is Challenging 346 The Audio Signature 347 Measuring Similarity 348 Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351 Building Profiles 352 Comparing Profiles 352 Making Predictions 353 Lessons Learned 354 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Customer Survival 360 What Survival Curves Reveal 360 Finding the Average Tenure from a Survival Curve 362 Customer Retention Using Survival 364 Looking at Survival as Decay 365 Hazard Probabilities 367 The Basic Idea 368 Examples of Hazard Functions 369 Censoring 371 The Hazard Calculation 372 Other Types of Censoring 375 From Hazards to Survival 376 Retention 376 Survival 378 Comparison of Retention and Survival 378 Proportional Hazards 380 Examples of Proportional Hazards 381 Stratification: Measuring Initial Effects on Survival 382 Cox Proportional Hazards 382 Survival Analysis in Practice 385 Handling Different Types of Attrition 385 When Will a Customer Come Back? 387 Understanding Customer Value 389 Forecasting 392 Hazards Changing over Time 393 Lessons Learned 394 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Optimization 398 What Is an Optimization Problem? 398 An Optimization Problem in Ant World 399 E Pluribus Unum 400 A Smarter Ant 401 Genetic Algorithms 403 A Bit of History 404 Genetics on Computers 404 Representing the Genome 413 Schemata: The Building Blocks of Genetic Algorithms 414 Beyond the Simple Algorithm 417 The Traveling Salesman Problem 418 Exhaustive Search 419 A Simple Greedy Algorithm 419 The Genetic Algorithms Approach 419 The Swarm Intelligence Approach 420 Case Study: Using Genetic Algorithms for Resource Optimization 421 Case Study: Evolving a Solution for Classifying Complaints 423 Business Context 424 Data 425 The Comment Signature 425 The Genomes 426 The Fitness Function 427 The Results 427 Lessons Learned 427 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Undirected Techniques, Undirected Data Mining 431 Undirected versus Directed Techniques 431 Undirected versus Directed Data Mining 431 Case Study: Undirected Data Mining Using Directed Techniques 432 What is Undirected Data Mining? 435 Data Exploration 435 Segmentation and Clustering 436 Target Variable Definition, When the Target Is Not Explicit 438 Simulation, Forecasting, and Agent-Based Modeling 443 Methodology for Undirected Data Mining 455 There Is No Methodology 456 Things to Keep in Mind 456 Lessons Learned 457 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Searching for Islands of Simplicity 461 Customer Segmentation and Clustering 461 Similarity Clusters 463 Tracking Campaigns by Cluster-Based Segments 464 Clustering Reveals an Overlooked Market Segment 466 Fitting the Troops 467 The K-Means Clustering Algorithm 468 Two Steps of the K-Means Algorithm 468 Voronoi Diagrams and K-Means Clusters 471 Choosing the Cluster Seeds 473Choosing K 473 Using K-Means to Detect Outliers 474 Semi-Directed Clustering 475 Interpreting Clusters 475 Characterizing Clusters by Their Centroids 476 Characterizing Clusters by What Differentiates Them 477 Using Decision Trees to Describe Clusters 478 Evaluating Clusters 479 Cluster Measurements and Terminology 480 Cluster Silhouettes 480 Limiting Cluster Diameter for Scoring 483 Case Study: Clustering Towns 484 Creating Town Signatures 484 Creating Clusters 486 Determining the Right Number of Clusters 486 Evaluating the Clusters 487 Using Demographic Clusters to Adjust Zone Boundaries 488 Business Success 490 Variations on K-Means 490 K-Medians, K-Medoids, and K-Modes 490 The Soft Side of K-Means 494 Data Preparation for Clustering 495 Scaling for Consistency 496 Use Weights to Encode Outside Information 496 Selecting Variables for Clustering 497 Lessons Learned 497 Chapter 14 Alternative Approaches to Cluster Detection 499 Shortcomings of K-Means 500 Reasonableness 500 An Intuitive Example 501 Fixing the Problem by Changing the Scales 503 What This Means in Practice 504 Gaussian Mixture Models 505 Adding “Gaussians” to K-Means 505 Back to Gaussian Mixture Models 508 Scoring GMMs 510 Applying GMMs 511 Divisive Clustering 513 A Decision Tree–Like Method for Clustering 513 Scoring Divisive Clusters 515 Clusters and Trees 515 Agglomerative (Hierarchical) Clustering 516 Overview of Agglomerative Clustering Methods 516 Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520 Scoring Agglomerative Clusters 522 Limitations of Agglomerative Clustering 523 Agglomerative Clustering in Practice 525 Combining Agglomerative Clustering and K-Means 526 Self-Organizing Maps 527 What Is a Self-Organizing Map? 527 Training an SOM 530 Scoring an SOM 531 The Search Continues for Islands of Simplicity 532 Lessons Learned 533 Chapter 15 Market Basket Analysis and Association Rules 535 Defining Market Basket Analysis 536 Four Levels of Market Basket Data 537 The Foundation of Market Basket Analysis: Basic Measures 539 Order Characteristics 540 Item (Product) Popularity 541 Tracking Marketing Interventions 542 Case Study: Spanish or English 543 The Business Problem 543 The Data 544 Defining “Hispanicity” Preference 545 The Solution 546 Association Analysis 547 Rules Are Not Always Useful 548 Item Sets to Association Rules 551 How Good Is an Association Rule? 553 Building Association Rules 555 Choosing the Right Set of Items 556 Anonymous Versus Identified 561 Generating Rules from All This Data 561 Overcoming Practical Limits 565 The Problem of Big Data 567 Extending the Ideas 569 Different Items on the Right- and Left-Hand Sides 569 Using Association Rules to Compare Stores 570 Association Rules and Cross-Selling 572 A Typical Cross-Sell Model 572 A More Confident Approach to Product Propensities 573 Results from Using Confidence 574 Sequential Pattern Analysis 574 Finding the Sequences 575 Sequential Association Rules 578 Sequential Analysis Using Other Data Mining Techniques 579 Lessons Learned 579 Chapter 16 Link Analysis 581 Basic Graph Theory 582 What Is a Graph? 582 Directed Graphs 584 Weighted Graphs 585 Seven Bridges of Königsberg 585 Detecting Cycles in a Graph 588 The Traveling Salesman Problem Revisited 589 Social Network Analysis 593 Six Degrees of Separation 593 What Your Friends Say About You 595 Finding Childcare Benefits Fraud 596 Who Responds to Whom on Dating Sites 597 Social Marketing 598 Mining Call Graphs 598 Case Study: Tracking Down the Leader of the Pack 601 The Business Goal 601 The Data Processing Challenge 601 Finding Social Networks in Call Data 602 How the Results Are Used for Marketing 602 Estimating Customer Age 603 Case Study: Who Is Using Fax Machines from Home? 604 Why Finding Fax Machines Is Useful 604 How Do Fax Machines Behave? 604 A Graph Coloring Algorithm 605 “Coloring” the Graph to Identify Fax Machines 606 How Google Came to Rule the World 607 Hubs and Authorities 608 The Details 609 Hubs and Authorities in Practice 611 Lessons Learned 612 Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613 The Architecture of Data 615 Transaction Data, the Base Level 616 Operational Summary Data 617 Decision-Support Summary Data 617 Database Schema/Data Models 618 Metadata 623 Business Rules 623 A General Architecture for Data Warehousing 624 Source Systems 624 Extraction, Transformation, and Load 626 Central Repository 627 Metadata Repository 630 Data Marts 630 Operational Feedback 631 Users and Desktop Tools 631 Analytic Sandboxes 633 Why Are Analytic Sandboxes Needed? 634 Technology to Support Analytic Sandboxes 636 Where Does OLAP Fit In? 639 What’s in a Cube? 641 Star Schema 646 OLAP and Data Mining 648 Where Data Mining Fits in with Data Warehousing 650 Lots of Data 651 Consistent, Clean Data 651 Hypothesis Testing and Measurement 652 Scalable Hardware and RDBMS Support 653 Lessons Learned 653 Chapter 18 Building Customer Signatures 655 Finding Customers in Data 656 What Is a Customer? 657 Accounts? Customers? Households? 658 Anonymous Transactions 658 Transactions Linked to a Card 659 Transactions Linked to a Cookie 659 Transactions Linked to an Account 660 Transactions Linked to a Customer 661 Designing Signatures 661 Is a Customer Signature Necessary? 666 What Does a Row Represent? 666 Will the Signature Be Used for Predictive Modeling? 671 Has a Target Been Defined? 672 Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672 Which Customers Will Be Included? 673 What Might Be Interesting to Know About Customers? 673 What a Signature Looks Like 674 Process for Creating Signatures 677 Some Data Is Already at the Right Level of Granularity 678 Pivoting a Regular Time Series 679 Aggregating Time-Stamped Transactions 680 Dealing with Missing Values 685 Missing Values in Source Data 685 Unknown or Non-Existent? 687 What Not to Do 687 Things to Consider 689 Lessons Learned 691 Chapter 19 Derived Variables: Making the Data Mean More 693 Handset Churn Rate as a Predictor of Churn 694 Single-Variable Transformations 696 Standardizing Numeric Variables 696 Turning Numeric Values into Percentiles 697 Turning Counts into Rates 698 Relative Measures 699 Replacing Categorical Variables with Numeric Ones 700 Combining Variables 707 Classic Combinations 707 Combining Highly Correlated Variables 710 Rent to Home Value 712 Extracting Features from Time Series 718 Trend 719 Seasonality 721 Extracting Features from Geography 722 Geocoding 722 Mapping 723 Using Geography to Create Relative Measures 724 Using Past Values of the Target Variable 725 Using Model Scores as Inputs 725 Handling Sparse Data 726 Account Set Patterns 726 Binning Sparse Values 727 Capturing Customer Behavior from Transactions 727 Widening Narrow Data 728 Sphere of Influence as a Predictor of Good Customers 728 An Example: Ratings to Rater Profile 730 Sample Fields from the Rater Signature 730 The Rating Signature and Derived Variables 732 Lessons Learned 733 Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735 Problems with Too Many Variables 736 Risk of Correlation Among Input Variables 736 Risk of Overfitting 738 The Sparse Data Problem 738 Visualizing Sparseness 739 Independence 740 Exhaustive Feature Selection 743 Flavors of Variable Reduction Techniques 744 Using the Target 744 Original versus New Variables 744 Sequential Selection of Features 745 The Traditional Forward Selection Methodology 745 Forward Selection Using a Validation Set 747 Stepwise Selection 748 Forward Selection Using Non-Regression Techniques 748 Backward Selection 748 Undirected Forward Selection 749 Other Directed Variable Selection Methods 749 Using Decision Trees to Select Variables 750 Variable Reduction Using Neural Networks 752 Principal Components 753 What Are Principal Components? 753 Principal Components Example 758 Principal Component Analysis 763 Factor Analysis 767 Variable Clustering 768 Example of Variable Clusters 768 Using Variable Clusters 770 Hierarchical Variable Clustering 770 Divisive Variable Clustering 773 Lessons Learned 774 Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775 What Is Text Mining? 776 Text Mining for Derived Columns 776 Beyond Derived Features 777 Text Analysis Applications 778 Working with Text Data 781 Sources of Text 781 Language Effects 782 Basic Approaches to Representing Documents 783 Representing Documents in Practice 784 Documents and the Corpus 786 Case Study: Ad Hoc Text Mining 786 The Boycott 787 Business as Usual 787 Combining Text Mining and Hypothesis Testing 787 The Results 788 Classifying News Stories Using MBR 789 What Are the Codes? 789 Applying MBR 790 The Results 793 From Text to Numbers 794 Starting with a “Bag of Words” 794 Term-Document Matrix 796 Corpus Effects 797 Singular Value Decomposition (SVD) 798 Text Mining and Naïve Bayesian Models 800 Naïve Bayesian in the Text World 801 Identifying Spam Using Naïve Bayesian 801 Sentiment Analysis 806 DIRECTV: A Case Study in Customer Service 809 Background 809 Applying Text Mining 811 Taking the Technical Approach 814 Not an Iterative Process 818 Continuing to Benefit 818 Lessons Learned 819 Index 821
£37.05
John Wiley & Sons Inc Graphical Models
Book SynopsisGraphical models are of increasing importance in applied statistics, and in particular in data mining. Providing a self-contained introduction and overview to learning relational, probabilistic, and possibilistic networks from data, this second edition of Graphical Models is thoroughly updated to include the latest research in this burgeoning field, including a new chapter on visualization. The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.Trade Review“The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.” (Zentralblatt Math, 1 August 2013) "All of the necessary background is provided, with material on modeling under uncertainty and imprecision modeling, decomposition of distributions, graphical representation of distributions, applications relating to graphical models, and problems for further research." (Book News, December 2009)Table of ContentsPreface. 1 Introduction. 1.1 Data and Knowledge. 1.2 Knowledge Discovery and Data Mining. 1.3 Graphical Models. 1.4 Outline of this Book. 2 Imprecision and Uncertainty. 2.1 Modeling Inferences. 2.2 Imprecision and Relational Algebra. 2.3 Uncertainty and Probability Theory. 2.4 Possibility Theory and the Context Model. 3 Decomposition. 3.1 Decomposition and Reasoning. 3.2 Relational Decomposition. 3.3 Probabilistic Decomposition. 3.4 Possibilistic Decomposition. 3.5 Possibility versus Probability. 4 Graphical Representation. 4.1 Conditional Independence Graphs. 4.2 Evidence Propagation in Graphs. 5 Computing Projections. 5.1 Databases of Sample Cases. 5.2 Relational and Sum Projections. 5.3 Expectation Maximization. 5.4 Maximum Projections. 6 Naive Classifiers. 6.1 Naive Bayes Classifiers. 6.2 A Naive Possibilistic Classifier. 6.3 Classifier Simplification. 6.4 Experimental Evaluation. 7 Learning Global Structure. 7.1 Principles of Learning Global Structure. 7.2 Evaluation Measures. 7.3 Search Methods. 7.4 Experimental Evaluation. 8 Learning Local Structure. 8.1 Local Network Structure. 8.2 Learning Local Structure. 8.3 Experimental Evaluation. 9 Inductive Causation. 9.1 Correlation and Causation. 9.2 Causal and Probabilistic Structure. 9.3 Faithfulness and Latent Variables. 9.4 The Inductive Causation Algorithm. 9.5 Critique of the Underlying Assumptions. 9.6 Evaluation. 10 Visualization. 10.1 Potentials. 10.2 Association Rules. 11 Applications. 11.1 Diagnosis of Electrical Circuits. 11.2 Application in Telecommunications. 11.3 Application at Volkswagen. 11.4 Application at DaimlerChrysler. A Proofs of Theorems. A.1 Proof of Theorem 4.1.2. A.2 Proof of Theorem 4.1.18. A.3 Proof of Theorem 4.1.20. A.4 Proof of Theorem 4.1.26. A.5 Proof of Theorem 4.1.28. A.6 Proof of Theorem 4.1.30. A.7 Proof of Theorem 4.1.31. A.8 Proof of Theorem 5.4.8. A.9 Proof of Lemma .2.2. A.10 Proof of Lemma .2.4. A.11 Proof of Lemma .2.6. A.12 Proof of Theorem 7.3.1. A.13 Proof of Theorem 7.3.2. A.14 Proof of Theorem 7.3.3. A.15 Proof of Theorem 7.3.5. A.16 Proof of Theorem 7.3.7. B Software Tools. Bibliography. Index.
£97.95
John Wiley & Sons Inc Data Mining Techniques in CRM Inside Customer
Book SynopsisThis is an applied handbook for the application of data mining techniques in the CRM framework. It combines a technical and a business perspective to cover the needs of business users who are looking for a practical guide on data mining.Trade Review"The book is written in a language that is easily accessible to business users who are not fluent in statistical methods and who have no prior exposure to the data mining or customer segmentation domain . . . This book is poised to become a standard reference, and I unconditionally recommend it to anyone working in this field." (Computing Reviews, 23 June 2011) "This is an excellent book for any data miner or anybody involved in CRM. The text is clear and pictures are well done and funny which is rare enough to be mentioned. From basic to advanced topics, the book is a very pleasant journey inside data mining with a clear focus on customer segmentation. Really advised if you're not a fan of formulas." (Data Mining Research, 18 March 2011)Table of ContentsAcknowledgements. 1. Data Mining in CRM. The CRM Strategy. What Can Data Mining Do? The Data Mining Methodology. Data Mining and Business Domain Expertise. Summary. 2. An Overview of Data Mining Techniques. Supervised Modeling. Unsupervised Modeling Techniques. Machine Learning/Artificial Intelligence vs. Statistical Techniques. Summary. 3. Data Mining Techniques for Segmentation. Segmenting Customers with Data Mining Techniques. Principal Components Analysis. Clustering Techniques. Examining and Evaluating the Cluster Solution. Understanding the Clusters through Profiling. Selecting the Optimal Cluster Solution. Cluster Profiling and Scoring with Supervised Models. An Introduction to Decision Tree Models. Summary. 4. The Mining Data Mart. Designing the Mining Data Mart. The Time Frame Covered by the Mining Data Mart. The Mining Data Mart for Retail Banking. The Mining Data Mart for Mobile Telephony Consumer (Residential) Customers. The Mining Data Mart for Retailers. Summary. 5. Customer Segmentation. An Introduction to Customer Segmentation. Segmentation Types in Consumer Markets. Segmentation in Business Markets. A Guide for Behavioral Segmentation. Segmentation Management Strategy. A Guide for Value-Based Segmentation. Designing Differentiated Strategies for the Value Segments. Summary. 6. Segmentation Applications in Banking. Segmentation for Credit Card Holders. Segmentation in Retail Banking. The Marketing Process. Segmentation in Retail Banking; A Summary. 7. Segmentation Applications in Telecommunications. Mobile Telephony. The Fixed Telephony Case. Summary. 8. Segmentation for Retailers. Segmentation in the Retail Industry. The RFM Analysis. Grouping Customers According to the Products They Buy. Summary. Further Reading. Index.
£64.55
University of California Press Data Mining for the Social Sciences
Book SynopsisWe live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Providing an introduction to data mining, the authors discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists.Table of ContentsPART 1. CONCEPTS 1. What Is Data Mining? 2. Contrasts with the Conventional Statistical Approach 3. Some General Strategies Used in Data Mining 4. Important Stages in a Data Mining Project PART 2. WORKED EXAMPLES 5. Preparing Training and Test Datasets 6. Variable Selection Tools 7. Creating New Variables Using Binning and Trees 8. Extracting Variables 9. Classifiers 10. Classification Trees 11. Neural Networks 12. Clustering 13. Latent Class Analysis and Mixture Models 14. Association Rules Conclusion Bibliography Notes Index
£28.90
Princeton University Press The Silicon Jungle
Book SynopsisWhat happens when a naive intern is granted unfettered access to people's most private thoughts and actions? Stephen Thorpe lands a coveted internship at Ubatoo, an Internet empire that provides its users with popular online services, from a search engine and e-mail, to social networking. When Stephen's boss asks him to work on a project with the ATrade ReviewCo-Winner of the 2012 Mary Shelley Award for Outstanding Fictional Work, Media Ecology Association "Baluja's clever, cynical debut explores the frightening possibilities of data mining... A nod to Upton Sinclair's muckraking The Jungle, which scared its readers into regulating the meat-packing industry, this lively if depressing novel suggests that computer snooping is too seductive to control, despite the consequences."--Publishers Weekly "[F]righteningly convincing... The read is quick, the questions will linger, and the ideas are so intriguing... Baluja simplifies the abstract world of tech-speak for the rest of us while aiming to do for the Internet what Upton Sinclair's The Jungle did for the meat industry: make readers reconsider its safety. For fans of intelligent thrillers."--Stephen Morrow, Library Journal "In the era of the ubiquitous web company, The Silicon Jungle provides ample food for thought."--Zena Iovino, New Scientist "[T]his cautionary tale is fascinating for its exploration of technology as a conduit for crime."--Michele Leber, Booklist Online "The book's central message is fascinating. A company like Google, Baluja points out, has far more information on U.S. citizens than does the FBI and far fewer restrictions on how to use it. It's a chilling message in a fun package."--Kathleen Offenholley, Mathematics TeacherTable of ContentsPreface xi Endings 1 Anklets 3 Anthropologists in the Midst 10 Mollycoddle 13 Touchpoints 19 Checking In 26 Working 9 to 4 28 Predicting the Future and 38 Needles 33 Contact 39 Two Geeks in a Pod 47 An Understatement 53 Euphoria and Diet Pills 61 To Better Days 70 Marathon 75 The Life and Soul of an Intern 81 Candid Cameras 85 Episodes 89 Liberal Food and Even More Liberal Activism 92 Subjects 100 Newsworthy 105 Patience 110 Hypergrowth 113 Little Pink Houses 117 Truth, Lies, and Algorithms 122 Negotiations and Herding Cats 129 The JENNY Discovery 133 I Dream of JENNY 138 A Five-Step Program: Hallucinations and Archetypes 143 Over-Deliver 150 A Life Changed in Four Phone Calls 154 Giving Thanks 160 A Drive through the Country 166 Control 171 A Tale of Two Tenures 178 Prelude to Pie 183 The Yuri Effect 188 Apple Pie 195 Thoughts Like Butterflies 201 Core-Relations 207 Collide 212 Control, Revisited 220 Fables of the Deconstruction 223 Control, Foregone 232 Foundations 236 One Way 241 Sebastin's Friends 244 A Tinker by Any Other Name 251 When It Rains 262 I Am a Heartbeat 267 What I Did This Summer 273 A Permanent Position 280 For Adam 284 Faith 288 Counting by Two 291 Disconnect 298 Sahim 304 Epilogue: Beginnings 309 Acknowledgments 313 Know More 315 Privacy Policy of a Few Organizations 317 References 319
£25.20
Springer-Verlag New York Inc. Data Mining Techniques for the Life Sciences
Book SynopsisThis third edition details new and updated methods and protocols on important databases and data mining tools. Chapters guides readers through archives of macromolecular sequences and three-dimensional structures, databases of protein-protein interactions, methods for prediction conformational disorder, mutant thermodynamic stability, aggregation, and drug response. Quality of structural data and their release, soft mechanics applications in biology, and protein flexibility are considered, too, together with pan-genome analyses, rational drug combination screening and Omics Deep Mining. Written in the format of the highly successful Methods in Molecular Biology series, each chapter includes an introduction to the topic, lists necessary materials, includes step-by-step, readily reproducible protocols. Authoritative and cutting-edge, Data Mining Techniques for the Life Sciences, Third Edition aims to be a practical guide to researches to help furthertheir study in this field.Table of ContentsPart I: DATABASES 1 EBI data resources Rolf Apweiler and Amonida Zadissa 2 IMEx databases: displaying molecular interactions into a single, standards-compliant dataset Pablo Porras, Sandra Orchard and Luana Licata 3 Protein Three-dimensional Structure Databases Vaishali P. Waman, Christine Orengo, Gerard J. Kleywegt and Arthur M. Lesk Part II: PREDICTION METHODS 4 Predicting protein conformational disorder and disordered binding sites Ketty Tamburrini, Giulia Pesce, Juliet Nilsson, Frank Gondelaud, Andrey V. Kajava, Jean-Guy Berrin and Sonia Longhi 5 Profiles of natural and designed protein-like sequences effectively bridge protein sequence gaps: Implications in distant homology detection Gayatri Kumar, Narayanaswamy Srinivasa and Sankaran Sandhya 6 Turning failures into applications: the problem of protein ΔΔG prediction Rita Casadio, Castrense Savojardo, Piero Fariselli, Emidio Capriotti and Pier Luigi Martelli 7 Dissecting the genome for drug response prediction Gerardo Pepe, Chiara Carrino, Luca Parca, Manuela Helmer-Citterich 8 Prediction of the effect of pH on the aggregation and conditional folding of intrinsically disordered proteins with SolupHred and DispHred Valentín Iglesias, Carlos Pintado-Grima, Jaime Santos, Marc Fornt and Salvador Ventura 9 Extracting the dynamic motion of proteins using Normal Mode Analysis Jacob A. Bauer and Vladena Bauerová Part III: DATA QUALITY 10 Pre- and Post- Publication Verification for Reproducible Data Mining in Macromolecular Crystallography John R Helliwell 11 Soft Statistical Mechanics for Biology Mariano Bizzarri, Alessandro Giuliani 12 Uses and abuses of the atomic displacement parameters in structural biology Oliviero Carugo 13 Optimizing the Parametrization of Homologue Classification in the Pan-Genome Computation for a Bacterial Species: Case Study Streptococcus pyogenes Erwin Tantoso, Birgit Eisenhaber and Frank Eisenhaber Part VI: BIG DATA 14 Computational pipeline for rational drug combination screening in patient-derived cells Paschalis Athanasiadis, Aleksandr Ianevski, Sigrid Skånland and Tero Aittokallio 15 Deep Mining from Omics Data Abeer Alzubaidi and Jonathan Tepper
£151.99
John Wiley & Sons Inc Data Mining Algorithms
Book SynopsisData Mining Algorithms is a practical, technically-oriented guide to data mining algorithms that covers the most important algorithms for building classification, regression, and clustering models, as well as techniques used for attribute selection and transformation, model quality evaluation, and creating model ensembles. The author presents many of the important topics and methodologies widely used in data mining, whilst demonstrating the internal operation and usage of data mining algorithms using examples in R.Table of ContentsAcknowledgements xix Preface xxi References xxxi Part I Preliminaries 1 1 Tasks 3 1.1 Introduction 3 1.2 Inductive learning tasks 5 1.3 Classification 9 1.4 Regression 14 1.5 Clustering 16 1.6 Practical issues 19 1.7 Conclusion 20 1.8 Further readings 21 References 22 2 Basic statistics 23 2.1 Introduction 23 2.2 Notational conventions 24 2.3 Basic statistics as modeling 24 2.4 Distribution description 25 2.5 Relationship detection 47 2.6 Visualization 62 2.7 Conclusion 65 2.8 Further readings 66 References 67 Part II Classification 69 3 Decision trees 71 3.1 Introduction 71 3.2 Decision tree model 72 3.3 Growing 76 3.4 Pruning 90 3.5 Prediction 103 3.6 Weighted instances 105 3.7 Missing value handling 106 3.8 Conclusion 114 3.9 Further readings 114 References 116 4 Naïve Bayes classifier 118 4.1 Introduction 118 4.2 Bayes rule 118 4.3 Classification by Bayesian inference 120 4.4 Practical issues 125 4.5 Conclusion 131 4.6 Further readings 131 References 132 5 Linear classification 134 5.1 Introduction 134 5.2 Linear representation 136 5.3 Parameter estimation 145 5.4 Discrete attributes 154 5.5 Conclusion 155 5.6 Further readings 156 References 157 6 Misclassification costs 159 6.1 Introduction 159 6.2 Cost representation 161 6.3 Incorporating misclassification costs 164 6.4 Effects of cost incorporation 176 6.5 Experimental procedure 180 6.6 Conclusion 184 6.7 Further readings 185 References 187 7 Classification model evaluation 189 7.1 Introduction 189 7.2 Performance measures 190 7.3 Evaluation procedures 213 7.4 Conclusion 231 7.5 Further readings 232 References 233 Part III Regression 235 8 Linear regression 237 8.1 Introduction 237 8.2 Linear representation 238 8.3 Parameter estimation 242 8.4 Discrete attributes 250 8.5 Advantages of linear models 251 8.6 Beyond linearity 252 8.7 Conclusion 258 8.8 Further readings 258 References 259 9 Regression trees 261 9.1 Introduction 261 9.2 Regression tree model 262 9.3 Growing 263 9.4 Pruning 274 9.5 Prediction 277 9.6 Weighted instances 278 9.7 Missing value handling 279 9.8 Piecewise linear regression 284 9.9 Conclusion 292 9.10 Further readings 292 References 293 10 Regression model evaluation 295 10.1 Introduction 295 10.2 Performance measures 296 10.3 Evaluation procedures 303 10.4 Conclusion 309 10.5 Further readings 309 References 310 Part IV Clustering 311 11 (Dis)similarity measures 313 11.1 Introduction 313 11.2 Measuring dissimilarity and similarity 313 11.3 Difference-based dissimilarity 314 11.4 Correlation-based similarity 321 11.5 Missing attribute values 324 11.6 Conclusion 325 11.7 Further readings 325 References 326 12 k-Centers clustering 328 12.1 Introduction 328 12.2 Algorithm scheme 330 12.3 k-Means 334 12.4 Beyond means 338 12.5 Beyond (fixed) k 342 12.6 Explicit cluster modeling 343 12.7 Conclusion 345 12.8 Further readings 345 References 347 13 Hierarchical clustering 349 13.1 Introduction 349 13.2 Cluster hierarchies 351 13.3 Agglomerative clustering 353 13.4 Divisive clustering 361 13.5 Hierarchical clustering visualization 364 13.6 Hierarchical clustering prediction 366 13.7 Conclusion 369 13.8 Further readings 370 References 371 14 Clustering model evaluation 373 14.1 Introduction 373 14.2 Per-cluster quality measures 376 14.3 Overall quality measures 385 14.4 External quality measures 393 14.5 Using quality measures 397 14.6 Conclusion 398 14.7 Further readings 398 References 399 Part V Getting Better Models 401 15 Model ensembles 403 15.1 Introduction 403 15.2 Model committees 404 15.3 Base models 406 15.4 Model aggregation 420 15.5 Specific ensemble modeling algorithms 431 15.6 Quality of ensemble predictions 448 15.7 Conclusion 449 15.8 Further readings 450 References 451 16 Kernel methods 454 16.1 Introduction 454 16.2 Support vector machines 457 16.3 Support vector regression 473 16.4 Kernel trick 482 16.5 Kernel functions 484 16.6 Kernel prediction 487 16.7 Kernel-based algorithms 489 16.8 Conclusion 494 16.9 Further readings 495 References 496 17 Attribute transformation 498 17.1 Introduction 498 17.2 Attribute transformation task 499 17.3 Simple transformations 504 17.4 Multiclass encoding 510 17.5 Conclusion 521 17.6 Further readings 521 References 522 18 Discretization 524 18.1 Introduction 524 18.2 Discretization task 525 18.3 Unsupervised discretization 530 18.4 Supervised discretization 533 18.5 Effects of discretization 551 18.6 Conclusion 553 18.7 Further readings 553 References 556 19 Attribute selection 558 19.1 Introduction 558 19.2 Attribute selection task 559 19.3 Attribute subset search 562 19.4 Attribute selection filters 568 19.5 Attribute selection wrappers 588 19.6 Effects of attribute selection 593 19.7 Conclusion 598 19.8 Further readings 599 References 600 20 Case studies 602 20.1 Introduction 602 20.2 Census income 605 20.3 Communities and crime 631 20.4 Cover type 640 20.5 Conclusion 654 20.6 Further readings 655 References 655 Closing 657 A Notation 659 A.1 Attribute values 659 A.2 Data subsets 659 A.3 Probabilities 660 B R packages 661 B.1 CRAN packages 661 B.2 DMR packages 662 B.3 Installing packages 663 References 664 C Datasets 666 Index 667
£59.80
John Wiley & Sons Inc Data Mining and Business Analytics with R
Book SynopsisCollecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets.Trade Review"I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (Journal of the American Statistical Association, 1 January 2014)Table of ContentsPreface ix Acknowledgments xi 1. Introduction 1 Reference 6 2. Processing the Information and Getting to Know Your Data 7 2.1 Example 1: 2006 Birth Data 7 2.2 Example 2: Alumni Donations 17 2.3 Example 3: Orange Juice 31 References 39 3. Standard Linear Regression 40 3.1 Estimation in R 43 3.2 Example 1: Fuel Efficiency of Automobiles 43 3.3 Example 2: Toyota Used-Car Prices 47 Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53 References 54 4. Local Polynomial Regression: a Nonparametric Regression Approach 55 4.1 Model Selection 56 4.2 Application to Density Estimation and the Smoothing of Histograms 58 4.3 Extension to the Multiple Regression Model 58 4.4 Examples and Software 58 References 65 5. Importance of Parsimony in Statistical Modeling 67 5.1 How Do We Guard Against False Discovery 67 References 70 6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71 6.1 Example 1: Prostate Cancer 74 6.2 Example 2: Orange Juice 78 References 82 7. Logistic Regression 83 7.1 Building a Linear Model for Binary Response Data 83 7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85 7.3 Statistical Inference 85 7.4 Classification of New Cases 86 7.5 Estimation in R 87 7.6 Example 1: Death Penalty Data 87 7.7 Example 2: Delayed Airplanes 92 7.8 Example 3: Loan Acceptance 100 7.9 Example 4: German Credit Data 103 References 107 8. Binary Classification, Probabilities, and Evaluating Classification Performance 108 8.1 Binary Classification 108 8.2 Using Probabilities to Make Decisions 108 8.3 Sensitivity and Specificity 109 8.4 Example: German Credit Data 109 9. Classification Using a Nearest Neighbor Analysis 115 9.1 The k-Nearest Neighbor Algorithm 116 9.2 Example 1: Forensic Glass 117 9.3 Example 2: German Credit Data 122 Reference 125 10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical Predictor Variables 126 10.1 Example: Delayed Airplanes 127 Reference 131 11. Multinomial Logistic Regression 132 11.1 Computer Software 134 11.2 Example 1: Forensic Glass 134 11.3 Example 2: Forensic Glass Revisited 141 Appendix 11.A Specification of a Simple Triplet Matrix 147 References 149 12. More on Classification and a Discussion on Discriminant Analysis 150 12.1 Fisher’s Linear Discriminant Function 153 12.2 Example 1: German Credit Data 154 12.3 Example 2: Fisher Iris Data 156 12.4 Example 3: Forensic Glass Data 157 12.5 Example 4: MBA Admission Data 159 Reference 160 13. Decision Trees 161 13.1 Example 1: Prostate Cancer 167 13.2 Example 2: Motorcycle Acceleration 179 13.3 Example 3: Fisher Iris Data Revisited 182 14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185 14.1 R Packages for Tree Construction 185 14.2 Chi-Square Automatic Interaction Detection (CHAID) 186 14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188 14.4 Support Vector Machines (SVM) 192 14.5 Neural Networks 192 14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193 References 195 15. Clustering 196 15.1 k-Means Clustering 196 15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204 15.3 Hierarchical Clustering Procedures 212 References 219 16. Market Basket Analysis: Association Rules and Lift 220 16.1 Example 1: Online Radio 222 16.2 Example 2: Predicting Income 227 References 234 17. Dimension Reduction: Factor Models and Principal Components 235 17.1 Example 1: European Protein Consumption 238 17.2 Example 2: Monthly US Unemployment Rates 243 18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247 18.1 Three Examples 249 References 257 19. Text as Data: Text Mining and Sentiment Analysis 258 19.1 Inverse Multinomial Logistic Regression 259 19.2 Example 1: Restaurant Reviews 261 19.3 Example 2: Political Sentiment 266 Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268 References 271 20. Network Data 272 20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274 20.2 Example 2: Connections in a Friendship Network 278 References 292 Appendix A: Exercises 293 Exercise 1 294 Exercise 2 294 Exercise 3 296 Exercise 4 298 Exercise 5 299 Exercise 6 300 Exercise 7 301 Appendix B: References 338 Index 341
£98.06
John Wiley & Sons Inc Effective CRM using Predictive Analytics
Book SynopsisA step-by-step guide to data mining applications in CRM. Following a handbook approach, this book bridges the gap between analytics and their use in everyday marketing, providing guidance on solving real business problems using data mining techniques. The book is organized into three parts.Table of ContentsPreface xiii Acknowledgments xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the data 1 1.1 The applications 1 1.2 The methodology 4 1.3 The algorithms 6 1.3.1 Supervised models 6 1.3.1.1 Classification models 7 1.3.1.2 Estimation (regression) models 9 1.3.1.3 Feature selection (field screening) 10 1.3.2 Unsupervised models 10 1.3.2.1 Cluster models 11 1.3.2.2 Association (affinity) and sequence models 12 1.3.2.3 Dimensionality reduction models 14 1.3.2.4 Record screening models 14 1.4 The data 15 1.4.1 The mining datamart 16 1.4.2 The required data per industry 16 1.4.3 The customer “signature”: from the mining datamart to the enriched, marketing reference table 16 1.5 Summary 20 Part I The Methodology 21 2 Classification modeling methodology 23 2.1 An overview of the methodology for classification modeling 23 2.2 Business understanding and design of the process 24 2.2.1 Definition of the business objective 24 2.2.2 Definition of the mining approach and of the data model 26 2.2.3 Design of the modeling process 27 2.2.3.1 Defining the modeling population 27 2.2.3.2 Determining the modeling (analysis) level 28 2.2.3.3 Definition of the target event and population 28 2.2.3.4 Deciding on time frames 29 2.3 Data understanding, preparation, and enrichment 33 2.3.1 Investigation of data sources 34 2.3.2 Selecting the data sources to be used 34 2.3.3 Data integration and aggregation 35 2.3.4 Data exploration, validation, and cleaning 35 2.3.5 Data transformations and enrichment 38 2.3.6 Applying a validation technique 40 2.3.6.1 Split or Holdout validation 40 2.3.6.2 Cross or n‐fold validation 45 2.3.6.3 Bootstrap validation 47 2.3.7 Dealing with imbalanced and rare outcomes 48 2.3.7.1 Balancing 48 2.3.7.2 Applying class weights 53 2.4 Classification modeling 57 2.4.1 Trying different models and parameter settings 57 2.4.2 Combining models 60 2.4.2.1 Bagging 61 2.4.2.2 Boosting 62 2.4.2.3 Random Forests 63 2.5 Model evaluation 64 2.5.1 Thorough evaluation of the model accuracy 65 2.5.1.1 Accuracy measures and confusion matrices 66 2.5.1.2 Gains, Response, and Lift charts 70 2.5.1.3 ROC curve 78 2.5.1.4 Profit/ROI charts 81 2.5.2 Evaluating a deployed model with test–control groups 85 2.6 Model deployment 88 2.6.1 Scoring customers to roll the marketing campaign 88 2.6.1.1 Building propensity segments 93 2.6.2 Designing a deployment procedure and disseminating the results 94 2.7 Using classification models in direct marketing campaigns 94 2.8 Acquisition modeling 95 2.8.1.1 Pilot campaign 95 2.8.1.2 Profiling of high‐value customers 96 2.9 Cross‐selling modeling 97 2.9.1.1 Pilot campaign 98 2.9.1.2 Product uptake 98 2.9.1.3 Profiling of owners 99 2.10 Offer optimization with next best product campaigns 100 2.11 Deep‐selling modeling 102 2.11.1.1 Pilot campaign 102 2.11.1.2 Usage increase 103 2.11.1.3 Profiling of customers with heavy product usage 104 2.12 Up‐selling modeling 105 2.12.1.1 Pilot campaign 105 2.12.1.2 Product upgrade 107 2.12.1.3 Profiling of “premium” product owners 107 2.13 Voluntary churn modeling 108 2.14 Summary of what we’ve learned so far: it’s not about the tool or the modeling algorithm. It’s about the methodology and the design of the process 111 3 Behavioral segmentation methodology 112 3.1 An introduction to customer segmentation 112 3.2 An overview of the behavioral segmentation methodology 113 3.3 Business understanding and design of the segmentation process 115 3.3.1 Definition of the business objective 115 3.3.2 Design of the modeling process 115 3.3.2.1 Selecting the segmentation population 115 3.3.2.2 Selection of the appropriate segmentation criteria 116 3.3.2.3 Determining the segmentation level 116 3.3.2.4 Selecting the observation window 116 3.4 Data understanding, preparation, and enrichment 117 3.4.1 Investigation of data sources 117 3.4.2 Selecting the data to be used 117 3.4.3 Data integration and aggregation 118 3.4.4 Data exploration, validation, and cleaning 118 3.4.5 Data transformations and enrichment 122 3.4.6 Input set reduction 124 3.5 Identification of the segments with cluster modeling 126 3.6 Evaluation and profiling of the revealed segments 128 3.6.1 “Technical” evaluation of the clustering solution 128 3.6.2 Profiling of the revealed segments 132 3.6.3 Using marketing research information to evaluate the clusters and enrich their profiles 138 3.6.4 Selecting the optimal cluster solution and labeling the segments 139 3.7 Deployment of the segmentation solution, design and delivery of differentiated strategies 139 3.7.1 Building the customer scoring model for updating the segments 140 3.7.1.1 Building a Decision Tree for scoring: fine‐tuning the segments 141 3.7.2 Distribution of the segmentation information 141 3.7.3 Design and delivery of differentiated strategies 142 3.8 Summary 142 Part II The Algorithms 143 4 Classification algorithms 145 4.1 Data mining algorithms for classification 145 4.2 An overview of Decision Trees 146 4.3 The main steps of Decision Tree algorithms 146 4.3.1 Handling of predictors by Decision Tree models 148 4.3.2 Using terminating criteria to prevent trivial tree growing 149 4.3.3 Tree pruning 150 4.4 CART, C5.0/C4.5, and CHAID and their attribute selection measures 150 4.4.1 The Gini index used by CART 151 4.4.2 The Information Gain Ratio index used by C5.0/C4.5 155 4.4.3 The chi‐square test used by CHAID 158 4.5 Bayesian networks 170 4.6 Naive Bayesian networks 172 4.7 Bayesian belief networks 176 4.8 Support vector machines 184 4.8.1 Linearly separable data 184 4.8.2 Linearly inseparable data 187 4.9 Summary 191 5 Segmentation algorithms 192 5.1 Segmenting customers with data mining algorithms 192 5.2 Principal components analysis 192 5.2.1 How many components to extract? 194 5.2.1.1 The eigenvalue (or latent root) criterion 196 5.2.1.2 The percentage of variance criterion 197 5.2.1.3 The scree test criterion 198 5.2.1.4 The interpretability and business meaning of the components 198 5.2.2 What is the meaning of each component? 199 5.2.3 Moving along with the component scores 201 5.3 Clustering algorithms 203 5.3.1 Clustering with K‐means 204 5.3.2 Clustering with TwoStep 211 5.4 Summary 213 Part III The Case Studies 215 6 A voluntary churn propensity model for credit card holders 217 6.1 The business objective 217 6.2 The mining approach 218 6.2.1 Designing the churn propensity model process 218 6.2.1.1 Selecting the data sources and the predictors 218 6.2.1.2 Modeling population and level of data 218 6.2.1.3 Target population and churn definition 218 6.2.1.4 Time periods and historical information required 219 6.3 The data dictionary 219 6.4 The data preparation procedure 221 6.4.1 From cards to customers: aggregating card‐level data 221 6.4.2 Enriching customer data 225 6.4.3 Defining the modeling population and the target field 228 6.5 Derived fields: the final data dictionary 232 6.6 The modeling procedure 232 6.6.1 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 232 6.6.2 Balancing the distribution of the target field 232 6.6.3 Setting the role of the fields in the model 239 6.6.4 Training the churn model 239 6.7 Understanding and evaluating the models 241 6.8 Model deployment: using churn propensities to target the retention campaign 248 6.9 The voluntary churn model revisited using RapidMiner 251 6.9.1 Loading the data and setting the roles of the attributes 251 6.9.2 Applying a Split (Holdout) validation and adjusting the imbalance of the target field’s distribution 252 6.9.3 Developing a Naïve Bayes model for identifying potential churners 252 6.9.4 Evaluating the performance of the model and deploying it to calculate churn propensities 253 6.10 Developing the churn model with Data Mining for Excel 254 6.10.1 Building the model using the Classify Wizard 256 6.10.2 Selecting the classification algorithm and its parameters 257 6.10.3 Applying a Split (Holdout) validation 257 6.10.4 Browsing the Decision Tree model 259 6.10.5 Validation of the model performance 259 6.10.6 Model deployment 263 6.11 Summary 266 7 Value segmentation and cross‐selling in retail 267 7.1 The business background and objective 267 7.2 An outline of the data preparation procedure 268 7.3 The data dictionary 272 7.4 The data preparation procedure 272 7.4.1 Pivoting and aggregating transactional data at a customer level 272 7.4.2 Enriching customer data and building the customer signature 276 7.5 The data dictionary of the modeling file 279 7.6 Value segmentation 285 7.6.1 Grouping customers according to their value 285 7.6.2 Value segments: exploration and marketing usage 287 7.7 The recency, frequency, and monetary (RFM) analysis 290 7.7.1 RFM basics 290 7.8 The RFM cell segmentation procedure 293 7.9 Setting up a cross‐selling model 295 7.10 The mining approach 295 7.10.1 Designing the cross‐selling model process 296 7.10.1.1 The data and the predictors 296 7.10.1.2 Modeling population and level of data 296 7.10.1.3 Target population and definition of target attribute 296 7.10.1.4 Time periods and historical information required 296 7.11 The modeling procedure 296 7.11.1 Preparing the test campaign and loading the campaign responses for modeling 298 7.11.2 Applying a Split (Holdout) validation: splitting the modelling dataset for evaluation purposes 298 7.11.3 Setting the roles of the attributes 299 7.11.4 Training the cross‐sell model 300 7.12 Browsing the model results and assessing the predictive accuracy of the classifiers 301 7.13 Deploying the model and preparing the cross‐selling campaign list 308 7.14 The retail case study using RapidMiner 309 7.14.1 Value segmentation and RFM cells analysis 310 7.14.2 Developing the cross‐selling model 312 7.14.3 Applying a Split (Holdout) validation 313 7.14.4 Developing a Decision Tree model with Bagging 314 7.14.5 Evaluating the performance of the model 317 7.14.6 Deploying the model and scoring customers 317 7.15 Building the cross‐selling model with Data Mining for Excel 319 7.15.1 Using the Classify Wizard to develop the model 319 7.15.2 Selecting a classification algorithm and setting the parameters 320 7.15.3 Applying a Split (Holdout) validation 322 7.15.4 Browsing the Decision Tree model 322 7.15.5 Validation of the model performance 325 7.15.6 Model deployment 329 7.16 Summary 331 8 Segmentation application in telecommunications 332 8.1 Mobile telephony: the business background and objective 332 8.2 The segmentation procedure 333 8.2.1 Selecting the segmentation population: the mobile telephony core segments 333 8.2.2 Deciding the segmentation level 335 8.2.3 Selecting the segmentation dimensions 335 8.2.4 Time frames and historical information analyzed 335 8.3 The data preparation procedure 335 8.4 The data dictionary and the segmentation fields 336 8.5 The modeling procedure 336 8.5.1 Preparing data for clustering: combining fields into data components 340 8.5.2 Identifying the segments with a cluster model 342 8.5.3 Profiling and understanding the clusters 344 8.5.4 Segmentation deployment 354 8.6 Segmentation using RapidMiner and K‐means cluster 354 8.6.1 Clustering with the K‐means algorithm 354 8.7 Summary 359 Bibliography 360 Index 362
£43.65
John Wiley & Sons Inc Statistical Data Analytics
Book SynopsisSolutions Manual to accompany Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery A comprehensive introduction to statistical methods for data mining and knowledge discovery.Extensivesolutions using actual data (with sample R programming code) are provided, illustrating diverse informatic sources in genomics, biomedicine, ecological remote sensing, astronomy, socioeconomics, marketing, advertising and finance, among many others.Table of ContentsPreface vii 1 Data analytics and data mining 1 2 Basic probability and statistical distributions 3 3 Data manipulation 14 4 Data visualization and statistical graphics 28 5 Statistical inference 45 6 Techniques for supervised learning: simple linear regression 65 7 Techniques for supervised learning: multiple linear regression 90 8 Supervised learning: generalized linear models 134 9 Supervised learning: classification 154 10 Techniques for unsupervised learning: dimension reduction 185 11 Techniques for unsupervised learning: clustering and association 200 References 216
£16.95
John Wiley & Sons Inc Text Mining in Practice with R
Book SynopsisA reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.Table of ContentsForeword 1 Chapter 1: What is Text Mining? 1 1.1 What is it? 1 1.1.1 What is text mining in practice? 1 1.1.2 Where does text mining fit? 1 1.2 Why we care about text mining? 1 1.2.1 What are the consequences of ignoring text? 1 1.2.2 What are the benefits of text mining? 1 1.2.3 Setting Expectations: When text mining should (and should not) be used. 1 1.3 A basic workflow. How the process works. 1 1.4 What tools do I need to get started with this? 1 1.5 A Simple Example 1 1.6 A Real World Use Case 1 1.7 Summary 1 Chapter 2: Basics of text mining 1 2.1 What is Text Mining in a practical sense? 1 2.2 Types of Text Mining: Bag of Words. 1 2.2.1 Types of Text Mining: Syntactic Parsing. 1 2.3 The text mining process in context 1 2.4 String Manipulation: Number of Characters & Substitutions 1 2.4.1 String Manipulations: Paste, Character Splits & Extractions 1 2.5 Keyword Scanning 1 2.6 String Packages stringr & stringi 1 2.7 Preprocessing Steps for Bag of Words Text Mining 1 2.8 Spell Check 1 2.9 Frequent Terms & Associations 1 2.9 Delta Assist Wrap Up 1 2.10 Summary 1 Chapter 3: Common Text Mining Visualizations 1 3.1 A tale of two (or three) cultures 1 3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1 3.2.1 Term Frequency 1 3.2.2 Word Associations 1 3.2.3 Word Networks 1 3.3 Simple Word Clusters: Hierarchical Dendrograms 1 3.4 Word Clouds: Overused but Effective 1 3.4.1 One Corpus Word Clouds 1 3.4.2 Comparing and Contrasting Corpora in Word Clouds 1 3.4.3 Polarized Tag Plot 1 3.5 Summary 1 Chapter 4: Sentiment Scoring 1 4.1 What is Sentiment Analysis? 1 4.2 Sentiment Scoring: Parlor Trick or Insightful? 1 4.3 Polarity: Simple Sentiment Scoring 1 4.3.1 Subjectivity Lexicons 1 4.3.2 Qdap’s Scoring for positive and negative word choice 1 4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1 4.4 Emoticons :) Dealing with these perplexing clues 1 4.4.1 Symbol-Based Emoticons Native to R 1 4.4.2 Punctuation Based Emoticons 1 4.4.3 Emoji 1 4.5 R’s Archived Sentiment Scoring Library 1 4.5 Sentiment the tidytext way 1 4.6 Airbnb.com Boston Wrap Up 1 4.7 Summary 1 Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1 5.1 What is clustering? 1 5.1.1 K Means Clustering 1 5.1.2 Spherical K Means Clustering 1 5.1.3 K Mediod Clustering 1 5.1.4 Evaluating the cluster approaches 1 5.2 Calculating & Exploring String Distance 1 5.2.1 What is string distance? 1 5.2.2 Fuzzy Matching-amatch, ain 1 5.2.3 Similarity Distances- stringdist, stringdistmatrix 1 5.3 LDA Topic Modeling Explained 1 5.3.2 Topic Modeling Case Study 1 5.3.2 LDA &LDAvis 1 5.4 Text to Vectors using “text2vec” 1 5.4.1 text2vec 1 5.5 Summary 1 Chapter 6: Document Classification: Finding Clickbait from Headlines 1 6.1 What is document classification? 1 6.2 Clickbait Case Study 1 6.2.2 Session & Data Set Up 1 6.2.3 GLMNET Training 1 6.2.4 GLMNET Test Predictions 1 6.2.5 Test Set Evaluation 1 6.2.6 Finding the most impactful words 1 6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1 6.3 Summary 1 Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1 7.1 Classification Vs Prediction 1 7.2 Case Study I: Will this patient come back to the hospital? 1 7.2.2 Patient Readmission in the Text Mining Workflow 1 7.2.3 Session & Data Set Up 1 7.2.4 Patient Modeling 1 7.2.5 More Model KPI: AUC, Recall, Precision & F1 1 7.2.5.1 Additional Evaluation Metrics 1 7.2.6 Apply the model to new patients 1 7.2.7 Patient Readmission Conclusion 1 7.3 Case Study II: Predicting Box Office Success 1 7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1 7.3.3 Session & Data Set Up 1 7.3.4 Opening Weekend Modeling 1 7.3.5 Model Evaluation 1 7.3.6 Apply the Model to new Movie Reviews 1 7.3.7 Movie Revenue Conclusion 1 7.4 Summary 1 Chapter 8: The OpenNLP Project 1 8.1 What is the OpenNLP project? 1 8.2 R’s OpenNLP Package 1 8.3 Named Entities in Hillary Clinton’s Email 1 8.3.1 R Session Set-up 1 8.3.2 Minor Text Cleaning 1 8.3.3 Using OpenNLP on a single email 1 8.3.4 Using OpenNLP on multiple documents 1 8.3.5 Revisiting the Text Mining Workflow 1 8.4 Analyzing the Named Entities 1 8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1 8.4.2 Mapping Only European Locations 1 8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1 8.4.4 Stock Charts for Entities 1 8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1 8.5 Summary 1 Chapter 9: Text Sources 1 9.1 Sourcing Text 1 9.2 Web Sources 1 9.2.1 Web Scraping a Single Page with rvest 1 9.2.2 Web Scraping Multiple Pages with rvest 1 9.2.3 Application Program Interfaces (APIs) 1 9.2.4 Newspaper Articles from The Guardian Newspaper 1 9.2.5 Tweets using the “twitteR” Package 1 9.2.6 Calling an API without a dedicated R package 1 9.2.7 Using jsonlite to access the New York Times 1 9.2.8 Using RCurl & XML to Parse Google News Feeds 1 9.2.9 The tm library Web-Mining Plugin 1 9.3 Getting Text from File Sources 1 9.3.1 Individual CSV, TXT and Microsoft Office Files 1 9.3.2 Reading multiple files quickly 1 9.3.2 Extracting Text from PDFs 1 9.3.3 Optical Character Recognition: Extracting Text from Images 1 9.4 Summary 1
£52.20
John Wiley & Sons Inc Data Science Strategy For Dummies
Book SynopsisAll the answers to your data science questions Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the what and the why of data science and covering what it takes to lead and nurture a top-notch team of data scientists. With this book, you'll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your real-life challenges as you uncover the stories and value hidden within data. Learn exactly what data science is and why it's importantAdopt a data-driven mindset as the foundation to successUnderstand the processes and common roadblocks behind data scienceKeep your data science program focused on generating business valueNurture a top-quality data science team In non-technical language, Data Science Strategy For Dummies outlTable of ContentsForeword xv Introduction 1 About This Book 2 Foolish Assumptions 3 How This Book is Organized 3 Icons Used In This Book 4 Beyond The Book 4 Where To Go From Here 5 Part 1: Optimizing Your Data Science Investment 7 Chapter 1: Framing Data Science Strategy 9 Establishing the Data Science Narrative 10 Capture 11 Maintain 12 Process 13 Analyze 14 Communicate 16 Actuate 17 Sorting Out the Concept of a Data-driven Organization 19 Approaching data-driven 20 Being data obsessed 21 Sorting Out the Concept of Machine Learning 22 Defining and Scoping a Data Science Strategy 26 Objectives 26 Approach 27 Choices 27 Data 27 Legal 28 Ethics 28 Competence 28 Infrastructure 29 Governance and security 29 Commercial/business models 30 Measurements 30 Chapter 2: Considering the Inherent Complexity in Data Science 31 Diagnosing Complexity in Data Science 32 Recognizing Complexity as a Potential 33 Enrolling in Data Science Pitfalls 101 34 Believing that all data is needed 34 Thinking that investing in a data lake will solve all your problems 35 Focusing on AI when analytics is enough 36 Believing in the 1-tool approach 37 Investing only in certain areas 37 Leveraging the infrastructure for reporting rather than exploration 38 Underestimating the need for skilled data scientists 39 `Navigating the Complexity 40 Chapter 3: Dealing with Difficult Challenges 41 Getting Data from There to Here 41 Handling dependencies on data owned by others 42 Managing data transfer and computation across-country borders 43 Managing Data Consistency Across the Data Science Environment 44 Securing Explainability in AI 45 Dealing with the Difference between Machine Learning and Traditional Software Programming 47 Managing the Rapid AI Technology Evolution and Lack of Standardization 50 Chapter 4: Managing Change in Data Science 51 Understanding Change Management in Data Science 52 Approaching Change in Data Science 53 Recognizing what to avoid when driving change in data science 56 Using Data Science Techniques to Drive Successful Change 59 Using digital engagement tools 59 Applying social media analytics to identify stakeholder sentiment 60 Capturing reference data in change projects 61 Using data to select people for change roles 61 Automating change metrics 62 Getting Started 62 Part 2: Making Strategic Choices for Your Data 65 Chapter 5: Understanding the Past, Present, and Future of Data 67 Sorting Out the Basics of Data 68 Explaining traditional data versus big data 69 Knowing the value of data 71 Exploring Current Trends in Data 73 Data monetization 73 Responsible AI 74 Cloud-based data architectures 75 Computation and intelligence in the edge 75 Digital twins 77 Blockchain 78 Conversational platforms 79 Elaborating on Some Future Scenarios 80 Standardization for data science productivity 80 From data monetization scenarios to a data economy 82 An explosion of human/machine hybrid systems 82 Quantum computing will solve the unsolvable problems 83 Chapter 6: Knowing Your Data 85 Selecting Your Data 85 Describing Data 87 Exploring Data 89 Assessing Data Quality 93 Improving Data Quality 95 Chapter 7: Considering the Ethical Aspects of Data Science 97 Explaining AI Ethics 98 Addressing trustworthy artificial intelligence 99 Introducing Ethics by Design 101 Chapter 8: Becoming Data-driven 103 Understanding Why Data-Driven is a Must 103 Transitioning to a Data-Driven Model 105 Securing management buy-in and assigning a chief data officer (CDO) 106 Identifying the key business value aligned with the business maturity 107 Developing a Data Strategy 108 Caring for your data 109 Democratizing the data 109 Driving data standardization 110 Structuring the data strategy 110 Establishing a Data-Driven Culture and Mindset 111 Chapter 9: Evolving from Data-driven to Machine-driven 113 Digitizing the Data 114 Applying a Data-driven Approach 115 Automating Workflows 116 Introducing AI/ML capabilities 116 Part 3: Building a Successful Data Science Organization 119 Chapter 10: Building Successful Data Science Teams 121 Starting with the Data Science Team Leader 121 Adopting different leadership approaches 122 Approaching data science leadership 124 Finding the right data science leader or manager 124 Defining the Prerequisites for a Successful Team 125 Developing a team structure 125 Establishing an infrastructure 126 Ensuring data availability 126 Insisting on interesting projects 127 Promoting continuous learning 127 Encouraging research studies 128 Building the Team 128 Developing smart hiring processes 129 Letting your teams evolve organically 130 Connecting the Team to the Business Purpose 131 Chapter 11: Approaching a Data Science Organizational Setup 133 Finding the Right Organizational Design 134 Designing the data science function 134 Evaluating the benefits of a center of excellence for data science 136 Identifying success factors for a data science center of excellence 137 Applying a Common Data Science Function 138 Selecting a location 138 Approaching ways of working 139 Managing expectations 141 Selecting an execution approach 142 Chapter 12: Positioning the Role of the Chief Data Officer (CDO) 145 Scoping the Role of the Chief Data Officer (CDO) 146 Explaining Why a Chief Data Officer is Needed 149 Establishing the CDO Role 150 The Future of the CDO Role 152 Chapter 13: Acquiring Resources and Competencies 155 Identifying the Roles in a Data Science Team 156 Data scientist 157 Data engineer 157 Machine learning engineer 158 Data architect 159 Business analyst 159 Software engineer 159 Domain expert 160 Seeing What Makes a Great Data Scientist 160 Structuring a Data Science Team 163 Hiring and evaluating the data science talent you need 165 Retaining Competence in Data Science 167 Understanding what makes a data scientist leave 169 Part 4: Investing in the Right Infrastructure 173 Chapter 14: Developing a Data Architecture 175 Defining What Makes Up a Data Architecture 176 Describing traditional architectural approaches 176 Elements of a data architecture 177 Exploring the Characteristics of a Modern Data Architecture 178 Explaining Data Architecture Layers 181 Listing the Essential Technologies for a Modern Data Architecture 184 NoSQL databases 184 Real-time streaming platforms 185 Docker and containers 185 Container repositories 186 Container orchestration 187 Microservices 187 Function as a service 188 Creating a Modern Data Architecture 189 Chapter 15: Focusing Data Governance on the Right Aspects 193 Sorting Out Data Governance 194 Data governance for defense or offense 195 Objectives for data governance 196 Explaining Why Data Governance is Needed 197 Data governance saves money 197 Bad data governance is dangerous 198 Good data governance provides clarity 198 Establishing Data Stewardship to Enforce Data Governance Rules 198 Implementing a Structured Approach to Data Governance 199 Chapter 16: Managing Models During Development and Production 203 Unfolding the Fundamentals of Model Management 203 Working with many models 204 Making the case for efficient model management 206 Implementing Model Management 207 Pinpointing implementation challenges 208 Managing model risk 210 Measuring the risk level 211 Identifying suitable control mechanisms 211 Chapter 17: Exploring the Importance of Open Source 213 Exploring the Role of Open Source 213 Understanding the importance of open source in smaller companies 214 Understanding the trend 215 Describing the Context of Data Science Programming Languages 215 Unfolding Open Source Frameworks for AI/ML Models 218 TensorFlow 219 Theano 219 Torch 219 Caffe and Caffe2 220 The Microsoft Cognitive Toolkit (previously known as Microsoft CNTK) 220 Keras 220 Scikit-learn 221 Spark MLlib 221 Azure ML Studio 221 Amazon Machine Learning 221 Choosing Open Source or Not? 222 Chapter 18: Realizing the Infrastructure 223 Approaching Infrastructure Realization 223 Listing Key Infrastructure Considerations for AI and ML Support 226 Location 226 Capacity 227 Data center setup 227 End-to-end management 227 Network infrastructure 228 Security and ethics 228 Advisory and supporting services 229 Ecosystem fit 229 Automating Workflows in Your Data Infrastructure 229 Enabling an Efficient Workspace for Data Engineers and Data Scientists 230 Part 5: Data as a Business 233 Chapter 19: Investing in Data as a Business 235 Exploring How to Monetize Data 236 Approaching data monetization is about treating data as an asset 237 Data monetization in a data economy 238 Looking to the Future of the Data Economy 240 Chapter 20: Using Data for Insights or Commercial Opportunities 243 Focusing Your Data Science Investment 243 Determining the Drivers for Internal Business Insights 244 Recognizing data science categories for practical implementation 245 Applying data-science-driven internal business insights 247 Using Data for Commercial Opportunities 248 Defining a data product 249 Distinguishing between categories of data products 250 Balancing Strategic Objectives 252 Chapter 21: Engaging Differently with Your Customers 255 Understanding Your Customers 255 Step 1: Engage your customers 256 Step 2: Identify what drives your customers 257 Step 3: Apply analytics and machine learning to customer actions 258 Step 4: Predict and prepare for the next step 259 Step 5: Imagine your customer’s future 260 Keeping Your Customers Happy 261 Serving Customers More Efficiently 263 Predicting demand 263 Automating tasks 264 Making company applications predictive 264 Chapter 22: Introducing Data-driven Business Models 265 Defining Business Models 265 Exploring Data-driven Business Models 267 Creating data-centric businesses 268 Investigating different types of data-driven business models 268 Using a Framework for Data-driven Business Models 275 Creating a data-driven business model using a framework 276 Key resources 277 Key activities 277 Offering/value proposition 278 Customer segment 278 Revenue model 279 Cost structure 280 Putting it all together 280 Chapter 23: Handling New Delivery Models 281 Defining Delivery Models for Data Products and Services 282 Understanding and Adapting to New Delivery Models 282 Introducing New Ways to Deliver Data Products 284 Self-service analytics environments as a delivery model 285 Applications, websites, and product/service interfaces as delivery models 287 Existing products and services 289 Downloadable files 290 APIs 290 Cloud services 291 Online market places 291 Downloadable licenses 292 Online services 293 Onsite services 293 Part 6: The Part of Tens 295 Chapter 24: Ten Reasons to Develop a Data Science Strategy 297 Expanding Your View on Data Science 297 Aligning the Company View 298 Creating a Solid Base for Execution 299 Realizing Priorities Early 299 Putting the Objective into Perspective 300 Creating an Excellent Base for Communication 300 Understanding Why Choices Matter 301 Identifying the Risks Early 301 Thoroughly Considering Your Data Need 302 Understanding the Change Impact 303 Chapter 25: Ten Mistakes to Avoid When Investing in Data Science 305 Don’t Tolerate Top Management’s Ignorance of Data Science 305 Don’t Believe That AI is Magic 306 Don’t Approach Data Science as a Race to the Death between Man and Machine 307 Don’t Underestimate the Potential of AI 308 Don’t Underestimate the Needed Data Science Skill Set 308 Don’t Think That a Dashboard is the End Objective 309 Don’t Forget about the Ethical Aspects of AI 310 Don’t Forget to Consider the Legal Rights to the Data 311 Don’t Ignore the Scale of Change Needed 312 Don’t Forget the Measurements Needed to Prove Value 313 Index 315
£22.09
John Wiley & Sons Inc OntologyBased Information Retrieval for
Book SynopsisWith the advancements of semantic web, ontology has become the crucial mechanism for representing concepts in various domains. For research and dispersal of customized healthcare services, a major challenge is to efficiently retrieve and analyze individual patient data from a large volume of heterogeneous data over a long time span. This requirement demands effective ontology-based information retrieval approaches for clinical information systems so that the pertinent information can be mined from large amount of distributed data. This unique and groundbreaking book highlights the key advances in ontology-based information retrieval techniques being applied in the healthcare domain and covers the following areas: Semantic data integration in e-health care systems Keyword-based medical information retrieval Ontology-based query retrieval support for e-health implementation Ontologies as a database management system technology for medicalTable of ContentsPreface xix Acknowledgment xxiii 1 Role of Ontology in Health Care 1Sonia Singla 1.1 Introduction 2 1.2 Ontology in Diabetes 3 1.2.1 Ontology Process 4 1.2.2 Impediments of the Present Investigation 5 1.3 Role of Ontology in Cardiovascular Diseases 6 1.4 Role of Ontology in Parkinson Diseases 8 1.4.1 The Spread of Disease With Age and Onset of Disease 10 1.4.2 Cost of PD for Health Care, Household 11 1.4.3 Treatment and Medicines 11 1.5 Role of Ontology in Depression 13 1.6 Conclusion 15 1.7 Future Scope 15 References 15 2 A Study on Basal Ganglia Circuit and Its Relation With Movement Disorders 19Dinesh Bhatia 2.1 Introduction 19 2.2 Anatomy and Functioning of Basal Ganglia 21 2.2.1 The Striatum-Major Entrance to Basal Ganglia Circuitry 22 2.2.2 Direct and Indirect Striatofugal Projections 23 2.2.3 The STN: Another Entrance to Basal Ganglia Circuitry 25 2.3 Movement Disorders 26 2.3.1 Parkinson Disease 26 2.3.2 Dyskinetic Disorder 27 2.3.3 Dystonia 28 2.4 Effect of Basal Ganglia Dysfunctioning on Movement Disorders 29 2.5 Conclusion and Future Scope 31 References 31 3 Extraction of Significant Association Rules Using Pre- and Post-Mining Techniques—An Analysis 37M. Nandhini and S. N. Sivanandam 3.1 Introduction 38 3.2 Background 39 3.2.1 Interestingness Measures 39 3.2.2 Pre-Mining Techniques 40 3.2.2.1 Candidate Set Reduction Schemes 40 3.2.2.2 Optimal Threshold Computation Schemes 41 3.2.2.3 Weight-Based Mining Schemes 42 3.2.3 Post-Mining Techniques 42 3.2.3.1 Rule Pruning Schemes 43 3.2.3.2 Schemes Using Knowledge Base 43 3.3 Methodology 44 3.3.1 Data Preprocessing 44 3.3.2 Pre-Mining 46 3.3.2.1 Pre-Mining Technique 1: Optimal Support and Confidence Threshold Value Computation Using PSO 46 3.3.2.2 Pre-Mining Technique 2: Attribute Weight Computation Using IG Measure 48 3.3.3 Association Rule Generation 50 3.3.3.1 ARM Preliminaries 50 3.3.3.2 WARM Preliminaries 52 3.3.4 Post-Mining 56 3.3.4.1 Filters 56 3.3.4.2 Operators 58 3.3.4.3 Rule Schemas 58 3.4 Experiments and Results 59 3.4.1 Parameter Settings for PSO-Based Pre-Mining Technique 60 3.4.2 Parameter Settings for PAW-Based Pre-Mining Technique 60 3.5 Conclusions 63 References 65 4 Ontology in Medicine as a Database Management System 69Shobowale K. O. 4.1 Introduction 70 4.1.1 Ontology Engineering and Development Methodology 72 4.2 Literature Review on Medical Data Processing 72 4.3 Information on Medical Ontology 75 4.3.1 Types of Medical Ontology 75 4.3.2 Knowledge Representation 76 4.3.3 Methodology of Developing Medical Ontology 76 4.3.4 Medical Ontology Standards 77 4.4 Ontologies as a Knowledge-Based System 78 4.4.1 Domain Ontology in Medicine 79 4.4.2 Brief Introduction of Some Medical Standards 81 4.4.2.1 Medical Subject Headings (MeSH) 81 4.4.2.2 Medical Dictionary for Regulatory Activities (MedDRA) 81 4.4.2.3 Medical Entities Dictionary (MED) 81 4.4.3 Reusing Medical Ontology 82 4.4.4 Ontology Evaluation 85 4.5 Conclusion 86 4.6 Future Scope 86 References 87 5 Using IoT and Semantic Web Technologies for Healthcare and Medical Sector 91Nikita Malik and Sanjay Kumar Malik 5.1 Introduction 92 5.1.1 Significance of Healthcare and Medical Sector and Its Digitization 92 5.1.2 e-Health and m-Health 92 5.1.3 Internet of Things and Its Use 94 5.1.4 Semantic Web and Its Technologies 96 5.2 Use of IoT in Healthcare and Medical Domain 98 5.2.1 Scope of IoT in Healthcare and Medical Sector 98 5.2.2 Benefits of IoT in Healthcare and Medical Systems 100 5.2.3 IoT Healthcare Challenges and Open Issues 100 5.3 Role of SWTs in Healthcare Services 101 5.3.1 Scope and Benefits of Incorporating Semantics in Healthcare 101 5.3.2 Ontologies and Datasets for Healthcare and Medical Domain 103 5.3.3 Challenges in the Use of SWTs in Healthcare Sector 104 5.4 Incorporating IoT and/or SWTs in Healthcare and Medical Sector 106 5.4.1 Proposed Architecture or Framework or Model 106 5.4.2 Access Mechanisms or Approaches 108 5.4.3 Applications or Systems 109 5.5 Healthcare Data Analytics Using Data Mining and Machine Learning 110 5.6 Conclusion 112 5.7 Future Work 113 References 113 6 An Ontological Model, Design, and Implementation of CSPF for Healthcare 117Pooja Mohan 6.1 Introduction 117 6.2 Related Work 119 6.3 Mathematical Representation of CSPF Model 122 6.3.1 Basic Sets of CSPF Model 123 6.3.2 Conditional Contextual Security and Privacy Constraints 123 6.3.3 CSPF Model States CsetofStates 124 6.3.4 Permission Cpermission 124 6.3.5 Security Evaluation Function (SEFcontexts) 124 6.3.6 Secure State 125 6.3.7 CSPF Model Operations 125 6.3.7.1 Administrative Operations 125 6.3.7.2 Users’ Operations 127 6.4 Ontological Model 127 6.4.1 Development of Class Hierarchy 127 6.4.1.1 Object Properties of Sensor Class 129 6.4.1.2 Data Properties 129 6.4.1.3 The Individuals 129 6.5 The Design of Context-Aware Security and Privacy Model for Wireless Sensor Network 129 6.6 Implementation 133 6.7 Analysis and Results 135 6.7.1 Inference Time/Latency/Query Response Time vs. No. of Policies 135 6.7.2 Average Inference Time vs. Contexts 136 6.8 Conclusion and Future Scope 137 References 138 7 Ontology-Based Query Retrieval Support for E-Health Implementation 143Aatif Ahmad Khan and Sanjay Kumar Malik 7.1 Introduction 143 7.1.1 Health Care Record Management 144 7.1.1.1 Electronic Health Record 144 7.1.1.2 Electronic Medical Record 145 7.1.1.3 Picture Archiving and Communication System 145 7.1.1.4 Pharmacy Systems 145 7.1.2 Information Retrieval 145 7.1.3 Ontology 146 7.2 Ontology-Based Query Retrieval Support 146 7.3 E-Health 150 7.3.1 Objectives and Scope 150 7.3.2 Benefits of E-Health 151 7.3.3 E-Health Implementation 151 7.4 Ontology-Driven Information Retrieval for E-Health 154 7.4.1 Ontology for E-Heath Implementation 155 7.4.2 Frameworks for Information Retrieval Using Ontology for E-Health 157 7.4.3 Applications of Ontology-Driven Information Retrieval in Health Care 158 7.4.4 Benefits and Limitations 160 7.5 Discussion 160 7.6 Conclusion 164 References 164 8 Ontology-Based Case Retrieval in an E-Mental Health Intelligent Information System 167Georgia Kaoura, Konstantinos Kovas and Basilis Boutsinas 8.1 Introduction 167 8.2 Literature Survey 170 8.3 Problem Identified 173 8.4 Proposed Solution 174 8.4.1 The PAVEFS Ontology 174 8.4.2 Knowledge Base 179 8.4.3 Reasoning 180 8.4.4 User Interaction 182 8.5 Pros and Cons of Solution 183 8.5.1 Evaluation Methodology and Results 183 8.5.2 Evaluation Methodology 185 8.5.2.1 Evaluation Tools 186 8.5.2.2 Results 187 8.6 Conclusions 189 8.7 Future Scope 190 References 190 9 Ontology Engineering Applications in Medical Domain 193Mariam Gawich and Marco Alfonse 9.1 Introduction 193 9.2 Ontology Activities 195 9.2.1 Ontology Learning 195 9.2.2 Ontology Matching 195 9.2.3 Ontology Merging (Unification) 195 9.2.4 Ontology Validation 196 9.2.5 Ontology Verification 196 9.2.6 Ontology Alignment 196 9.2.7 Ontology Annotation 196 9.2.8 Ontology Evaluation 196 9.2.9 Ontology Evolution 196 9.3 Ontology Development Methodologies 197 9.3.1 TOVE 197 9.3.2 Methontology 198 9.3.3 Brusa et al. Methodology 198 9.3.4 UPON Methodology 199 9.3.5 Uschold and King Methodology 200 9.4 Ontology Languages 203 9.4.1 RDF-RDF Schema 203 9.4.2 OWL 205 9.4.3 OWL 2 205 9.5 Ontology Tools 208 9.5.1 Apollo 208 9.5.2 NeON 209 9.5.3 Protégé 210 9.6 Ontology Engineering Applications in Medical Domain 212 9.6.1 Ontology-Based Decision Support System (DSS) 213 9.6.1.1 OntoDiabetic 213 9.6.1.2 Ontology-Based CDSS for Diabetes Diagnosis 214 9.6.1.3 Ontology-Based Medical DSS within E-Care Telemonitoring Platform 215 9.6.2 Medical Ontology in the Dynamic Healthcare Environment 216 9.6.3 Knowledge Management Systems 217 9.6.3.1 Ontology-Based System for Cancer Diseases 217 9.6.3.2 Personalized Care System for Chronic Patients at Home 218 9.7 Ontology Engineering Applications in Other Domains 219 9.7.1 Ontology Engineering Applications in E-Commerce 219 9.7.1.1 Automated Approach to Product Taxonomy Mapping in E-Commerce 219 9.7.1.2 LexOnt Matching Approach 221 9.7.2 Ontology Engineering Applications in Social Media Domain 222 9.7.2.1 Emotive Ontology Approach 222 9.7.2.2 Ontology-Based Approach for Social Media Analysis 224 9.7.2.3 Methodological Framework for Semantic Comparison of Emotional Values 225 References 226 10 Ontologies on Biomedical Informatics 233Marco Alfonse and Mariam Gawich 10.1 Introduction 233 10.2 Defining Ontology 234 10.3 Biomedical Ontologies and Ontology-Based Systems 235 10.3.1 MetaMap 235 10.3.2 GALEN 236 10.3.3 NIH-CDE 236 10.3.4 LOINC 237 10.3.5 Current Procedural Terminology (CPT) 238 10.3.6 Medline Plus Connect 238 10.3.7 Gene Ontology 239 10.3.8 UMLS 240 10.3.9 SNOMED-CT 240 10.3.10 OBO Foundry 240 10.3.11 Textpresso 240 10.3.12 National Cancer Institute Thesaurus 241 References 241 11 Machine Learning Techniques Best for Large Data Prediction: A Case Study of Breast Cancer Categorical Data: k-Nearest Neighbors 245Yagyanath Rimal 11.1 Introduction 246 11.2 R Programming 250 11.3 Conclusion 255 References 255 12 Need of Ontology-Based Systems in Healthcare System 257Tshepiso Larona Mokgetse 12.1 Introduction 258 12.2 What is Ontology? 259 12.3 Need for Ontology in Healthcare Systems 260 12.3.1 Primary Healthcare 262 12.3.1.1 Semantic Web System 262 12.3.2 Emergency Services 263 12.3.2.1 Service-Oriented Architecture 263 12.3.2.2 IOT Ontology 264 12.3.3 Public Healthcare 265 12.3.3.1 IOT Data Model 265 12.3.4 Chronic Disease Healthcare 266 12.3.4.1 Clinical Reminder System 266 12.3.4.2 Chronic Care Model 267 12.3.5 Specialized Healthcare 268 12.3.5.1 E-Health Record System 268 12.3.5.2 Maternal and Child Health 269 12.3.6 Cardiovascular System 270 12.3.6.1 Distributed Healthcare System 270 12.3.6.2 Records Management System 270 12.3.7 Stroke Rehabilitation 271 12.3.7.1 Patient Information System 271 12.3.7.2 Toronto Virtual System 271 12.4 Conclusion 272 References 272 13 Exploration of Information Retrieval Approaches With Focus on Medical Information Retrieval 275Mamata Rath and Jyotir Moy Chatterjee 13.1 Introduction 276 13.1.1 Machine Learning-Based Medical Information System 278 13.1.2 Cognitive Information Retrieval 278 13.2 Review of Literature 279 13.3 Cognitive Methods of IR 281 13.4 Cognitive and Interactive IR Systems 286 13.5 Conclusion 288 References 289 14 Ontology as a Tool to Enable Health Internet of Things Viable 5G Communication Networks 293Nidhi Sharma and R. K. Aggarwal 14.1 Introduction 293 14.2 From Concept Representations to Medical Ontologies 295 14.2.1 Current Medical Research Trends 296 14.2.2 Ontology as a Paradigm Shift in Health Informatics 296 14.3 Primer Literature Review 297 14.3.1 Remote Health Monitoring 298 14.3.2 Collecting and Understanding Medical Data 298 14.3.3 Patient Monitoring 298 14.3.4 Tele-Health 299 14.3.5 Advanced Human Services Records Frameworks 299 14.3.6 Applied Autonomy and Healthcare Mechanization 300 14.3.7 IoT Powers the Preventive Healthcare 301 14.3.8 Hospital Statistics Control System (HSCS) 301 14.3.9 End-to-End Accessibility and Moderateness 301 14.3.10 Information Mixing and Assessment 302 14.3.11 Following and Alerts 302 14.3.12 Remote Remedial Assistance 302 14.4 Establishments of Health IoT 303 14.4.1 Technological Challenges 304 14.4.2 Probable Solutions 306 14.4.3 Bit-by-Bit Action Statements 307 14.5 Incubation of IoT in Health Industry 307 14.5.1 Hearables 308 14.5.2 Ingestible Sensors 308 14.5.3 Moodables 308 14.5.4 PC Vision Innovation 308 14.5.5 Social Insurance Outlining 308 14.6 Concluding Remarks 309 References 309 15 Tools and Techniques for Streaming Data: An Overview 313K. Saranya, S. Chellammal and Pethuru Raj Chelliah 15.1 Introduction 314 15.2 Traditional Techniques 315 15.2.1 Random Sampling 315 15.2.2 Histograms 316 15.2.3 Sliding Window 316 15.2.4 Sketches 317 15.2.4.1 Bloom Filters 317 15.2.4.2 Count-Min Sketch 317 15.3 Data Mining Techniques 317 15.3.1 Clustering 318 15.3.1.1 STREAM 318 15.3.1.2 BRICH 318 15.3.1.3 CLUSTREAM 319 15.3.2 Classification 319 15.3.2.1 Naïve Bayesian 319 15.3.2.2 Hoeffding 320 15.3.2.3 Very Fast Decision Tree 320 15.3.2.4 Concept Adaptive Very Fast Decision Tree 320 15.4 Big Data Platforms 320 15.4.1 Apache Storm 321 15.4.2 Apache Spark 321 15.4.2.1 Apache Spark Core 321 15.4.2.2 Spark SQL 322 15.4.2.3 Machine Learning Library 322 15.4.2.4 Streaming Data API 322 15.4.2.5 GraphX 323 15.4.3 Apache Flume 323 15.4.4 Apache Kafka 323 15.4.5 Apache Flink 326 15.5 Conclusion 327 References 328 16 An Ontology-Based IR for Health Care 331J. P. Patra, Gurudatta Verma and Sumitra Samal 16.1 Introduction 331 16.2 General Definition of Information Retrieval Model 333 16.3 Information Retrieval Model Based on Ontology 334 16.4 Literature Survey 336 16.5 Methodolgy for IR 339 References 344
£164.66
John Wiley & Sons Inc Computation in BioInformatics
Book SynopsisCOMPUTATION IN BIOINFORMATICS Bioinformatics is a platform between the biology and information technology and this book provides readers with an understanding of the use of bioinformatics tools in new drug design. The discovery of new solutions to pandemics is facilitated through the use of promising bioinformatics techniques and integrated approaches. This book covers a broad spectrum of the bioinformatics field, starting with the basic principles, concepts, and application areas. Also covered is the role of bioinformatics in drug design and discovery, including aspects of molecular modeling. Some of the chapters provide detailed information on bioinformatics related topics, such as silicon design, protein modeling, DNA microarray analysis, DNA-RNA barcoding, and gene sequencing, all of which are currently needed in the industry. Also included are specialized topics, such as bioinformatics in cancer detection, genomics, and proteomics. Moreover, a few chapters explTable of ContentsPreface xiii 1 Bioinfomatics as a Tool in Drug Designing 1Rene Barbie Browne, Shiny C. Thomas and Jayanti Datta Roy 1.1 Introduction 1 1.2 Steps Involved in Drug Designing 3 1.2.1 Identification of the Target Protein/Enzyme 5 1.2.2 Detection of Molecular Site (Active Site) in the Target Protein 6 1.2.3 Molecular Modeling 6 1.2.4 Virtual Screening 9 1.2.5 Molecular Docking 10 1.2.6 QSAR (Quantitative Structure-Activity Relationship) 12 1.2.7 Pharmacophore Modeling 14 1.2.8 Solubility of Molecule 14 1.2.9 Molecular Dynamic Simulation 14 1.2.10 ADME Prediction 15 1.3 Various Softwares Used in the Steps of Drug Designing 16 1.4 Applications 18 1.5 Conclusion 20 References 20 2 New Strategies in Drug Discovery 25Vivek Chavda, Yogita Thalkari and Swati Marwadi 2.1 Introduction 26 2.2 Road Toward Advancement 27 2.3 Methodology 30 2.3.1 Target Identification 30 2.3.2 Docking-Based Virtual Screening 32 2.3.3 Conformation Sampling 33 2.3.4 Scoring Function 34 2.3.5 Molecular Similarity Methods 35 2.3.6 Virtual Library Construction 37 2.3.7 Sequence-Based Drug Design 37 2.4 Role of OMICS Technology 38 2.5 High-Throughput Screening and Its Tools 40 2.6 Chemoinformatic 44 2.6.1 Exploratory Data Analysis 45 2.6.2 Example Discovery 46 2.6.3 Pattern Explanation 46 2.6.4 New Technologies 46 2.7 Concluding Remarks and Future Prospects 46 References 48 3 Role of Bioinformatics in Early Drug Discovery: An Overview and Perspective 49Shasank S. Swain and Tahziba Hussain 3.1 Introduction 50 3.2 Bioinformatics and Drug Discovery 51 3.2.1 Structure-Based Drug Design (SBDD) 52 3.2.2 Ligand-Based Drug Design (LBDD) 53 3.3 Bioinformatics Tools in Early Drug Discovery 54 3.3.1 Possible Biological Activity Prediction Tools 55 3.3.2 Possible Physicochemical and Drug-Likeness Properties Verification Tools 58 3.3.3 Possible Toxicity and ADME/T Profile Prediction Tools 60 3.4 Future Directions With Bioinformatics Tool 61 3.5 Conclusion 63 Acknowledgements 64 References 64 4 Role of Data Mining in Bioinformatics 69Vivek P. Chavda, Amit Sorathiya, Disha Valu and Swati Marwadi 4.1 Introduction 70 4.2 Data Mining Methods/Techniques 71 4.2.1 Classification 71 4.2.1.1 Statistical Techniques 71 4.2.1.2 Clustering Technique 73 4.2.1.3 Visualization 74 4.2.1.4 Induction Decision Tree Technique 74 4.2.1.5 Neural Network 75 4.2.1.6 Association Rule Technique 75 4.2.1.7 Classification 75 4.3 DNA Data Analysis 77 4.4 RNA Data Analysis 79 4.5 Protein Data Analysis 79 4.6 Biomedical Data Analysis 80 4.7 Conclusion and Future Prospects 81 References 81 5 In Silico Protein Design and Virtual Screening 85Vivek P. Chavda, Zeel Patel, Yashti Parmar and Disha Chavda 5.1 Introduction 86 5.2 Virtual Screening Process 88 5.2.1 Before Virtual Screening 90 5.2.2 General Process of Virtual Screening 90 5.2.2.1 Step 1 (The Establishment of the Receptor Model) 91 5.2.2.2 Step 2 (The Generation of Small-Molecule Libraries) 92 5.2.2.3 Step 3 (Molecular Docking) 92 5.2.2.4 Step 4 (Selection of Lead Protein Compounds) 94 5.3 Machine Learning and Scoring Functions 94 5.4 Conclusion and Future Prospects 95 References 96 6 New Bioinformatics Platform-Based Approach for Drug Design 101Vivek Chavda, Soham Sheta, Divyesh Changani and Disha Chavda 6.1 Introduction 102 6.2 Platform-Based Approach and Regulatory Perspective 104 6.3 Bioinformatics Tools and Computer-Aided Drug Design 107 6.4 Target Identification 109 6.5 Target Validation 110 6.6 Lead Identification and Optimization 111 6.7 High-Throughput Methods (HTM) 112 6.8 Conclusion and Future Prospects 114 References 115 7 Bioinformatics and Its Application Areas 121Ragini Bhardwaj, Mohit Sharma and Nikhil Agrawal 7.1 Introduction 121 7.2 Review of Bioinformatics 124 7.3 Bioinformatics Applications in Different Areas 126 7.3.1 Microbial Genome Application 126 7.3.2 Molecular Medicine 129 7.3.3 Agriculture 130 7.4 Conclusion 131 References 131 8 DNA Microarray Analysis: From Affymetrix CEL Files to Comparative Gene Expression 139Sandeep Kumar, Shruti Shandilya, Suman Kapila, Mohit Sharma and Nikhil Agrawal 8.1 Introduction 140 8.2 Data Processing 140 8.2.1 Installation of Workflow 140 8.2.2 Importing the Raw Data for Processing 141 8.2.3 Retrieving Sample Annotation of the Data 142 8.2.4 Quality Control 143 8.2.4.1 Boxplot 144 8.2.4.2 Density Histogram 145 8.2.4.3 MA Plot 145 8.2.4.4 NUSE Plot 145 8.2.4.5 RLE Plot 145 8.2.4.6 RNA Degradation Plot 145 8.2.4.7 QCstat 148 8.3 Normalization of Microarray Data Using the RMA Method 148 8.3.1 Background Correction 148 8.3.2 Normalization 149 8.3.3 Summarization 149 8.4 Statistical Analysis for Differential Gene Expression 151 8.5 Conclusion 153 References 153 9 Machine Learning in Bioinformatics 155Rahul Yadav, Mohit Sharma and Nikhil Agrawal 9.1 Introduction and Background 156 9.1.1 Bioinformatics 158 9.1.2 Text Mining 159 9.1.3 IoT Devices 159 9.2 Machine Learning Applications in Bioinformatics 159 9.3 Machine Learning Approaches 161 9.4 Conclusion and Closing Remarks 162 References 162 10 DNA-RNA Barcoding and Gene Sequencing 165Gifty Sawhney, Mohit Sharma and Nikhil Agrawal 10.1 Introduction 166 10.2 RNA 169 10.3 DNA Barcoding 172 10.3.1 Introduction 172 10.3.2 DNA Barcoding and Molecular Phylogeny 177 10.3.3 Ribosomal DNA (rDNA) of the Nuclear Genome (nuDNA)—ITS 178 10.3.4 Chloroplast DNA 180 10.3.5 Mitochondrial DNA 181 10.3.6 Molecular Phylogenetic Analysis 181 10.3.7 Metabarcoding 189 10.3.8 Materials for DNA Barcoding 190 10.4 Main Reasons of DNA Barcoding 191 10.5 Limitations/Restrictions of DNA Barcoding 192 10.6 RNA Barcoding 192 10.6.1 Overview of the Method 193 10.7 Methodology 194 10.7.1 Materials Required 195 10.7.2 Barcoded RNA Sequencing High-Level Mapping of Single-Neuron Projections 196 10.7.3 Using RNA to Trace Neurons 196 10.7.4 A Life Conservation Barcoder 198 10.7.5 Gene Sequencing 199 10.7.5.1 DNA Sequencing Methods 200 10.7.5.2 First-Generation Sequencing Techniques 204 10.7.5.3 Maxam’s and Gilbert’s Chemical Method 204 10.7.5.4 Sanger Sequencing 205 10.7.5.5 Automation in DNA Sequencing 206 10.7.5.6 Use of Fluorescent-Marked Primers and ddNTPs 206 10.7.5.7 Dye Terminator Sequencing 207 10.7.5.8 Using Capillary Electrophoresis 207 10.7.6 Developments and High-Throughput Methods in DNA Sequencing 208 10.7.7 Pyrosequencing Method 209 10.7.8 The Genome Sequencer 454 FLX System 210 10.7.9 Illumina/Solexa Genome Analyzer 210 10.7.10 Transition Sequencing Techniques 211 10.7.11 Ion-Torrent’s Semiconductor Sequencing 211 10.7.12 Helico’s Genetic Analysis Platform 211 10.7.13 Third-Generation Sequencing Techniques 212 10.8 Conclusion 212 Abbreviations 213 Acknowledgement 214 References 214 11 Bioinformatics in Cancer Detection 229Mohit Sharma, Umme Abiha, Parul Chugh, Balakumar Chandrasekaran and Nikhil Agrawal 11.1 Introduction 230 11.2 The Era of Bioinformatics in Cancer 230 11.3 Aid in Cancer Research via NCI 232 11.4 Application of Big Data in Developing Precision Medicine 233 11.5 Historical Perspective and Development 235 11.6 Bioinformatics-Based Approaches in the Study of Cancer 237 11.6.1 SLAMS 237 11.6.2 Module Maps 238 11.6.3 COPA 239 11.7 Conclusion and Future Challenges 240 References 240 12 Genomic Association of Polycystic Ovarian Syndrome: Single-Nucleotide Polymorphisms and Their Role in Disease Progression 245Gowtham Kumar Subbaraj and Sindhu Varghese 12.1 Introduction 246 12.2 FSHR Gene 252 12.3 IL-10 Gene 252 12.4 IRS-1 Gene 253 12.5 PCR Primers Used 254 12.6 Statistical Analysis 255 12.7 Conclusion 258 References 259 13 An Insight of Protein Structure Predictions Using Homology Modeling 265S. Muthumanickam, P. Boomi, R. Subashkumar, S. Palanisamy, A. Sudha, K. Anand, C. Balakumar, M. Saravanan, G. Poorani, Yao Wang, K. Vijayakumar and M. Syed Ali 13.1 Introduction 266 13.2 Homology Modeling Approach 268 13.2.1 Strategies for Homology Modeling 269 13.2.2 Procedure 269 13.3 Steps Involved in Homology Modeling 270 13.3.1 Template Identification 270 13.3.2 Sequence Alignment 271 13.3.3 Backbone Generation 271 13.3.4 Loop Modeling 271 13.3.5 Side Chain Modeling 272 13.3.6 Model Optimization 272 13.3.6.1 Model Validation 272 13.4 Tools Used for Homology Modeling 273 13.4.1 Robetta 273 13.4.2 M4T (Multiple Templates) 273 13.4.3 I-Tasser (Iterative Implementation of the Threading Assembly Refinement) 273 13.4.4 ModBase 274 13.4.5 Swiss Model 274 13.4.6 PHYRE2 (Protein Homology/Analogy Recognition Engine 2) 274 13.4.7 Modeller 274 13.4.8 Conclusion 275 Acknowledgement 275 References 275 14 Basic Concepts in Proteomics and Applications 279Jesudass Joseph Sahayarayan, A.S. Enogochitra and Murugesan Chandrasekaran 14.1 Introduction 280 14.2 Challenges on Proteomics 281 14.3 Proteomics Based on Gel 283 14.4 Non-Gel–Based Electrophoresis Method 284 14.5 Chromatography 284 14.6 Proteomics Based on Peptides 285 14.7 Stable Isotopic Labeling 286 14.8 Data Mining and Informatics 287 14.9 Applications of Proteomics 289 14.10 Future Scope 290 14.11 Conclusion 291 References 292 15 Prospects of Covalent Approaches in Drug Discovery: An Overview 295Balajee Ramachandran, Saravanan Muthupandian and Jeyakanthan Jeyaraman 15.1 Introduction 296 15.2 Covalent Inhibitors Against the Biological Target 297 15.3 Application of Physical Chemistry Concepts in Drug Designing 299 15.4 Docking Methodologies—An Overview 301 15.5 Importance of Covalent Targets 302 15.6 Recent Framework on the Existing Docking Protocols 303 15.7 SN2 Reactions in the Computational Approaches 304 15.8 Other Crucial Factors to Consider in the Covalent Docking 305 15.8.1 Role of Ionizable Residues 305 15.8.2 Charge Regulation 306 15.8.3 Charge-Charge Interactions 306 15.9 QM/MM Approaches 309 15.10 Conclusion and Remarks 310 Acknowledgements 311 References 311 Index 321
£138.56
John Wiley & Sons Inc Machine Learning for Time Series Forecasting with
Book SynopsisLearn how to apply the principles of machine learning totime series modeling with thisindispensableresource Machine Learning for Time Series Forecasting with Pythonis an incisive and straightforward examination of one of the most crucial elements of decision-makingin finance,marketing,education, and healthcare:time series modeling. Despitethe centrality of time series forecasting, few business analysts are familiar with the power or utility of applying machine learning to time series modeling. Author Francesca Lazzeri, a distinguishedmachine learning scientistandeconomist,corrects that deficiency by providing readers withcomprehensiveand approachableexplanation andtreatment of the applicationof machine learning to time series forecasting. Written for readers who have little to no experience in time seriesforecastingor machine learning, the book comprehensively coversall the topics necessary to: Understand time series forecasting concepts, such asstationarity,horizon,trend,and seasonalityPrepare time series dataformodelingEvaluatetime series forecasting models'performance and accuracyUnderstand when to use neural networks instead of traditional time series models in time series forecasting Machine Learning for Time Series Forecasting with Pythonis fullreal-world examples, resourcesand concrete strategies to help readers explore and transform data and develop usable, practical time series forecasts. Perfect for entry-level data scientists, business analysts,developers, and researchers, this book is an invaluable and indispensable guide to the fundamental and advanced concepts of machine learning applied to time series modeling. Table of ContentsAcknowledgments vii Introduction xv Chapter 1 Overview of Time Series Forecasting 1 Flavors of Machine Learning for Time Series Forecasting 3 Supervised Learning for Time Series Forecasting 14 Python for Time Series Forecasting 21 Experimental Setup for Time Series Forecasting 24 Conclusion 26 Chapter 2 How to Design an End-to-End Time Series Forecasting Solution on the Cloud 29 Time Series Forecasting Template 31 Business Understanding and Performance Metrics 33 Data Ingestion 36 Data Exploration and Understanding 39 Data Pre-processing and Feature Engineering 40 Modeling Building and Selection 42 An Overview of Demand Forecasting Modeling Techniques 44 Model Evaluation 46 Model Deployment 48 Forecasting Solution Acceptance 53 Use Case: Demand Forecasting 54 Conclusion 58 Chapter 3 Time Series Data Preparation 61 Python for Time Series Data 62 Common Data Preparation Operations for Time Series 65 Time stamps vs. Periods 66 Converting to Timestamps 69 Providing a Format Argument 70 Indexing 71 Time/Date Components 76 Frequency Conversion 78 Time Series Exploration and Understanding 79 How to Get Started with Time Series Data Analysis 79 Data Cleaning of Missing Values in the Time Series 84 Time Series Data Normalization and Standardization 86 Time Series Feature Engineering 89 Date Time Features 90 Lag Features and Window Features 92 Rolling Window Statistics 95 Expanding Window Statistics 97 Conclusion 98 Chapter 4 Introduction to Autoregressive and Automated Methods for Time Series Forecasting 101 Autoregression 102 Moving Average 119 Autoregressive Moving Average 120 Autoregressive Integrated Moving Average 122 Automated Machine Learning 129 Conclusion 136 Chapter 5 Introduction to Neural Networks for Time Series Forecasting 137 Reasons to Add Deep Learning to Your Time Series Toolkit 138 Deep Learning Neural Networks Are Capable of Automatically Learning and Extracting Features from Raw and Imperfect Data 140 Deep Learning Supports Multiple Inputs and Outputs 142 Recurrent Neural Networks Are Good at Extracting Patterns from Input Data 143 Recurrent Neural Networks for Time Series Forecasting 144 Recurrent Neural Networks 145 Long Short-Term Memory 147 Gated Recurrent Unit 148 How to Prepare Time Series Data for LSTMs and GRUs 150 How to Develop GRUs and LSTMs for Time Series Forecasting 154 Keras 155 TensorFlow 156 Univariate Models 156 Multivariate Models 160 Conclusion 164 Chapter 6 Model Deployment for Time Series Forecasting 167 Experimental Set Up and Introduction to Azure Machine Learning SDK for Python 168 Workspace 169 Experiment 169 Run 169 Model 170 Compute Target, RunConfiguration, and ScriptRun Config 171 Image and Webservice 172 Machine Learning Model Deployment 173 How to Select the Right Tools to Succeed with Model Deployment 175 Solution Architecture for Time Series Forecasting with Deployment Examples 177 Train and Deploy an ARIMA Model 179 Configure the Workspace 182 Create an Experiment 183 Create or Attach a Compute Cluster 184 Upload the Data to Azure 184 Create an Estimator 188 Submit the Job to the Remote Cluster 188 Register the Model 189 Deployment 189 Define Your Entry Script and Dependencies 190 Automatic Schema Generation 191 Conclusion 196 References 197 Index 199
£35.62
John Wiley & Sons Inc Smarter Data Science
Book SynopsisOrganizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments. When an organization manages its data effectively, its data science program becomes a fully scalaTable of ContentsForeword for Smarter Data Science xix Epigraph xxi Preamble xxiii Chapter 1 Climbing the AI Ladder 1 Readying Data for AI 2 Technology Focus Areas 3 Taking the Ladder Rung by Rung 4 Constantly Adapt to Retain Organizational Relevance 8 Data-Based Reasoning is Part and Parcel in the Modern Business 10 Toward the AI-Centric Organization 14 Summary 16 Chapter 2 Framing Part I: Considerations for Organizations Using AI 17 Data-Driven Decision-Making 18 Using Interrogatives to Gain Insight 19 The Trust Matrix 20 The Importance of Metrics and Human Insight 22 Democratizing Data and Data Science 23 Aye, a Prerequisite: Organizing Data Must Be a Forethought 26 Preventing Design Pitfalls 27 Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29 Quae Quaestio (Question Everything) 30 Summary 32 Chapter 3 Framing Part II: Considerations for Working with Data and AI 35 Personalizing the Data Experience for Every User 36 Context Counts: Choosing the Right Way to Display Data 38 Ethnography: Improving Understanding Through Specialized Data 42 Data Governance and Data Quality 43 The Value of Decomposing Data 43 Providing Structure Through Data Governance 43 Curating Data for Training 45 Additional Considerations for Creating Value 45 Ontologies: A Means for Encapsulating Knowledge 46 Fairness, Trust, and Transparency in AI Outcomes 49 Accessible, Accurate, Curated, and Organized 52 Summary 54 Chapter 4 A Look Back on Analytics: More Than One Hammer 57 Been Here Before: Reviewing the Enterprise Data Warehouse 57 Drawbacks of the Traditional Data Warehouse 64 Paradigm Shift 68 Modern Analytical Environments: The Data Lake 69 By Contrast 71 Indigenous Data 72 Attributes of Difference 73 Elements of the Data Lake 75 The New Normal: Big Data is Now Normal Data 77 Liberation from the Rigidity of a Single Data Model 78 Streaming Data 78 Suitable Tools for the Task 78 Easier Accessibility 79 Reducing Costs 79 Scalability 79 Data Management and Data Governance for AI 80 Schema-on-Read vs. Schema-on-Write 81 Summary 84 Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87 A Need for Organization 87 The Staging Zone 90 The Raw Zone 91 The Discovery and Exploration Zone 92 The Aligned Zone 93 The Harmonized Zone 98 The Curated Zone 100 Data Topologies 100 Zone Map 103 Data Pipelines 104 Data Topography 105 Expanding, Adding, Moving, and Removing Zones 107 Enabling the Zones 108 Ingestion 108 Data Governance 111 Data Storage and Retention 112 Data Processing 114 Data Access 116 Management and Monitoring 117 Metadata 118 Summary 119 Chapter 6 Addressing Operational Disciplines on the AI Ladder 121 A Passage of Time 122 Create 128 Stability 128 Barriers 129 Complexity 129 Execute 130 Ingestion 131 Visibility 132 Compliance 132 Operate 133 Quality 134 Reliance 135 Reusability 135 The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136 DevOps/MLOps 137 DataOps 139 AIOps 142 Summary 144 Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147 Toward a Value Chain 148 Chaining Through Correlation 152 Enabling Action 154 Expanding the Means to Act 155 Curation 156 Data Governance 159 Integrated Data Management 162 Onboarding 163 Organizing 164 Cataloging 166 Metadata 167 Preparing 168 Provisioning 169 Multi-Tenancy 170 Summary 173 Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175 Deriving Value: Managing Data as an Asset 175 An Inexact Science 180 Accessibility to Data: Not All Users are Equal 183 Providing Self-Service to Data 184 Access: The Importance of Adding Controls 186 Ranking Datasets Using a Bottom-Up Approach for Data Governance 187 How Various Industries Use Data and AI 188 Benefi ting from Statistics 189 Summary 198 Chapter 9 Constructing for the Long-Term 199 The Need to Change Habits: Avoiding Hard-Coding 200 Overloading 201 Locked In 202 Ownership and Decomposition 204 Design to Avoid Change 204 Extending the Value of Data Through AI 206 Polyglot Persistence 208 Benefi ting from Data Literacy 213 Understanding a Topic 215 Skillsets 216 It’s All Metadata 218 The Right Data, in the Right Context, with the Right Interface 219 Summary 221 Chapter 10 A Journey’s End: An IA for AI 223 Development Efforts for AI 224 Essential Elements: Cloud-Based Computing, Data, and Analytics 228 Intersections: Compute Capacity and Storage Capacity 234 Analytic Intensity 237 Interoperability Across the Elements 238 Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242 Data Management for the Data Puddle, Data Pond, and Data Lake 243 Driving Action: Context, Content, and Decision-Makers 245 Keep It Simple 248 The Silo is Dead; Long Live the Silo 250 Taxonomy: Organizing Data Zones 252 Capabilities for an Open Platform 256 Summary 260 Appendix Glossary of Terms 263 Index 269
£30.39
John Wiley & Sons Inc Responsible Data Science
Book SynopsisExplore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of Black box algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk Table of ContentsIntroduction xix Part I Motivation for Ethical Data Science and Background Knowledge 1 Chapter 1 Responsible Data Science 3 The Optum Disaster 4 Jekyll and Hyde 5 Eugenics 7 Galton, Pearson, and Fisher 7 Ties between Eugenics and Statistics 7 Ethical Problems in Data Science Today 9 Predictive Models 10 From Explaining to Predicting 10 Predictive Modeling 11 Setting the Stage for Ethical Issues to Arise 12 Classic Statistical Models 12 Black-Box Methods 14 Important Concepts in Predictive Modeling 19 Feature Selection 19 Model-Centric vs. Data-Centric Models 20 Holdout Sample and Cross-Validation 20 Overfitting 21 Unsupervised Learning 22 The Ethical Challenge of Black Boxes 23 Two Opposing Forces 24 Pressure for More Powerful AI 24 Public Resistance and Anxiety 24 Summary 25 Chapter 2 Background: Modeling and the Black-Box Algorithm 27 Assessing Model Performance 27 Predicting Class Membership 28 The Rare Class Problem 28 Lift and Gains 28 Area Under the Curve 29 AUC vs. Lift (Gains) 31 Predicting Numeric Values 32 Goodness-of-Fit 32 Holdout Sets and Cross-Validation 33 Optimization and Loss Functions 34 Intrinsically Interpretable Models vs. Black-Box Models 35 Ethical Challenges with Interpretable Models 38 Black-Box Models 39 Ensembles 39 Nearest Neighbors 41 Clustering 41 Association Rules 42 Collaborative Filters 42 Artificial Neural Nets and Deep Neural Nets 43 Problems with Black-Box Predictive Models 45 Problems with Unsupervised Algorithms 47 Summary 48 Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49 AI and Intentional Consequences by Design 50 Deepfakes 50 Supporting State Surveillance and Suppression 51 Behavioral Manipulation 52 Automated Testing to Fine-Tune Targeting 53 AI and Unintended Consequences 55 Healthcare 56 Finance 57 Law Enforcement 58 Technology 60 The Legal and Regulatory Landscape around AI 61 Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63 A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64 Trends in Emerging Law and Policy Related to AI 66 Summary 69 Part II The Ethical Data Science Process 71 Chapter 4 The Responsible Data Science Framework 73 Why We Keep Building Harmful AI 74 Misguided Need for Cutting-Edge Models 74 Excessive Focus on Predictive Performance 74 Ease of Access and the Curse of Simplicity 76 The Common Cause 76 The Face Thieves 78 An Anatomy of Modeling Harms 79 The World: Context Matters for Modeling 80 The Data: Representation Is Everything 83 The Model: Garbage In, Danger Out 85 Model Interpretability: Human Understanding for Superhuman Models 86 Efforts Toward a More Responsible Data Science 89 Principles Are the Focus 90 Nonmaleficence 90 Fairness 90 Transparency 91 Accountability 91 Privacy 92 Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92 Justification 94 Compilation 94 Preparation 95 Modeling 96 Auditing 96 Summary 97 Chapter 5 Model Interpretability: The What and the Why 99 The Sexist Résumé Screener 99 The Necessity of Model Interpretability 101 Connections Between Predictive Performance and Interpretability 103 Uniting (High) Model Performance and Model Interpretability 105 Categories of Interpretability Methods 107 Global Methods 107 Local Methods 113 Real-World Successes of Interpretability Methods 113 Facilitating Debugging and Audit 114 Leveraging the Improved Performance of Black-Box Models 116 Acquiring New Knowledge 116 Addressing Critiques of Interpretability Methods 117 Explanations Generated by Interpretability Methods Are Not Robust 118 Explanations Generated by Interpretability Methods Are Low Fidelity 120 The Forking Paths of Model Interpretability 121 The Four-Measure Baseline 122 Building Our Own Credit Scoring Model 124 Using Train-Test Splits 125 Feature Selection and Feature Engineering 125 Baseline Models 127 The Importance of Making Your Code Work for Everyone 129 Execution Variability 129 Addressing Execution Variability with Functionalized Code 130 Stochastic Variability 130 Addressing Stochastic Variability via Resampling 130 Summary 133 Part III EDS in Practice 135 Chapter 6 Beginning a Responsible Data Science Project 137 How the Responsible Data Science Framework Addresses the Common Cause 138 Datasets Used 140 Regression Datasets—Communities and Crime 140 Classification Datasets—COMPAS 140 Common Elements Across Our Analyses 141 Project Structure and Documentation 141 Project Structure for the Responsible Data Science Framework: Everything in Its Place 142 Documentation: The Responsible Thing to Do 145 Beginning a Responsible Data Science Project 151 Communities and Crime (Regression) 151 Justification 151 Compilation 154 Identifying Protected Classes 157 Preparation—Data Splitting and Feature Engineering 159 Datasheets 161 COMPAS (Classification) 164 Justification 164 Compilation 166 Identifying Protected Classes 168 Preparation 169 Summary 172 Chapter 7 Auditing a Responsible Data Science Project 173 Fairness and Data Science in Practice 175 The Many Different Conceptions of Fairness 175 Different Forms of Fairness Are Trade-Offs with Each Other 177 Quantifying Predictive Fairness Within a Data Science Project 179 Mitigating Bias to Improve Fairness 185 Preprocessing 185 In-processing 186 Postprocessing 186 Classification Example: COMPAS 187 Prework: Code Practices, Modeling, and Auditing 187 Justification, Compilation, and Preparation Review 189 Modeling 191 Auditing 200 Per-Group Metrics: Overall 200 Per-Group Metrics: Error 202 Fairness Metrics 204 Interpreting Our Models: Why Are They Unfair? 207 Analysis for Different Groups 209 Bias Mitigation 214 Preprocessing: Oversampling 214 Postprocessing: Optimizing Thresholds Automatically 218 Postprocessing: Optimizing Thresholds Manually 219 Summary 223 Chapter 8 Auditing for Neural Networks 225 Why Neural Networks Merit Their Own Chapter 227 Neural Networks Vary Greatly in Structure 227 Neural Networks Treat Features Differently 229 Neural Networks Repeat Themselves 231 A More Impenetrable Black Box 232 Baseline Methods 233 Representation Methods 233 Distillation Methods 234 Intrinsic Methods 235 Beginning a Responsible Neural Network Project 236 Justification 236 Moving Forward 239 Compilation 239 Tracking Experiments 241 Preparation 244 Modeling 245 Auditing 247 Per-Group Metrics: Overall 247 Per-Group Metrics: Unusual Definitions of “False Positive” 248 Fairness Metrics 249 Interpreting Our Models: Why Are They Unfair? 252 Bias Mitigation 253 Wrap-Up 255 Auditing Neural Networks for Natural Language Processing 258 Identifying and Addressing Sources of Bias in NLP 258 The Real World 259 Data 260 Models 261 Model Interpretability 262 Summary 262 Chapter 9 Conclusion 265 How Can We Do Better? 267 The Responsible Data Science Framework 267 Doing Better As Managers 269 Doing Better As Practitioners 270 A Better Future If We Can Keep It 271 Index 273
£24.79
John Wiley & Sons Inc Machine Learning for Business Analytics
Book SynopsisMachine Learning for Business Analytics Machine learningalso known as data mining or data analyticsis a fundamental part of data science. It is used by organizations in a wide variety of arenas to turn raw data into actionable information. Machine Learning for Business Analytics: Concepts, Techniques and Applications in RapidMiner provides a comprehensive introduction and an overview of this methodology. This best-selling textbook covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation and network analytics. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques. This is the seventh edition of Machine Learning for Business Analytics, and the first using RapidMiner software. This edition also includes: ATable of ContentsForeword by Ravi Bapna xxi Preface to the RapidMiner Edition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 What Is Business Analytics? 3 1.2 What Is Machine Learning? 5 1.3 Machine Learning, AI, and Related Terms 5 1.4 Big Data 7 1.5 Data Science 8 1.6 Why Are There So Many Different Methods? 9 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 12 1.9 Using RapidMiner Studio 14 CHAPTER 2 Overview of the Machine Learning Process 19 2.1 Introduction 19 2.2 Core Ideas in Machine Learning 20 2.3 The Steps in a Machine Learning Project 23 2.4 Preliminary Steps 25 2.5 Predictive Power and Overfitting 32 2.6 Building a Predictive Model with RapidMiner 37 2.7 Using RapidMiner for Machine Learning 45 2.8 Automating Machine Learning Solutions 47 2.9 Ethical Practice in Machine Learning 52 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 63 3.1 Introduction 63 3.2 Data Examples 65 3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66 3.4 Multidimensional Visualization 75 3.5 Specialized Visualizations 87 3.6 Summary: Major Visualizations and Operations, by Machine Learning Goal 92 CHAPTER 4 Dimension Reduction 97 4.1 Introduction 97 4.2 Curse of Dimensionality 98 4.3 Practical Considerations 98 4.4 Data Summaries 100 4.5 Correlation Analysis 103 4.6 Reducing the Number of Categories in Categorical Attributes 105 4.7 Converting a Categorical Attribute to a Numerical Attribute 107 4.8 Principal Component Analysis 107 4.9 Dimension Reduction Using Regression Models 117 4.10 Dimension Reduction Using Classification and Regression Trees 119 PART III PERFORMANCE EVALUATION CHAPTER 5 Evaluating Predictive Performance 125 5.1 Introduction 125 5.2 Evaluating Predictive Performance 126 5.3 Judging Classifier Performance 131 5.4 Judging Ranking Performance 146 5.5 Oversampling 151 PART IV PREDICTION AND CLASSIFICATION METHODS CHAPTER 6 Multiple Linear Regression 163 6.1 Introduction 163 6.2 Explanatory vs. Predictive Modeling 164 6.3 Estimating the Regression Equation and Prediction 166 6.4 Variable Selection in Linear Regression 171 CHAPTER 7 k-Nearest Neighbors (k-NN) 189 7.1 The k-NN Classifier (Categorical Label) 189 7.2 k-NN for a Numerical Label 200 7.3 Advantages and Shortcomings of k-NN Algorithms 202 CHAPTER 8 The Naive Bayes Classifier 209 8.1 Introduction 209 8.2 Applying the Full (Exact) Bayesian Classifier 211 8.3 Solution: Naive Bayes 213 8.4 Advantages and Shortcomings of the Naive Bayes Classifier 223 CHAPTER 9 Classification and Regression Trees 229 9.1 Introduction 229 9.2 Classification Trees 232 9.3 Evaluating the Performance of a Classification Tree 240 9.4 Avoiding Overfitting 245 9.5 Classification Rules from Trees 255 9.6 Classification Trees for More Than Two Classes 256 9.7 Regression Trees 256 9.8 Improving Prediction: Random Forests and Boosted Trees 259 9.9 Advantages and Weaknesses of a Tree 261 CHAPTER 10 Logistic Regression 269 10.1 Introduction 269 10.2 The Logistic Regression Model 271 10.3 Example: Acceptance of Personal Loan 272 10.4 Logistic Regression for Multi-class Classification 283 10.5 Example of Complete Analysis: Predicting Delayed Flights 286 CHAPTER 11 Neural Networks 305 11.1 Introduction 306 11.2 Concept and Structure of a Neural Network 306 11.3 Fitting a Network to Data 307 11.4 Required User Input 321 11.5 Exploring the Relationship Between Predictors and Target Attribute 322 11.6 Deep Learning 323 11.7 Advantages and Weaknesses of Neural Networks 334 CHAPTER 12 Discriminant Analysis 337 12.1 Introduction 337 12.2 Distance of a Record from a Class 340 12.3 Fisher’s Linear Classification Functions 341 12.4 Classification Performance of Discriminant Analysis 346 12.5 Prior Probabilities 348 12.6 Unequal Misclassification Costs 348 12.7 Classifying More Than Two Classes 349 12.8 Advantages and Weaknesses 351 CHAPTER 13 Generating, Comparing, and Combining Multiple Models 359 13.1 Automated Machine Learning (AutoML) 359 13.2 Explaining Model Predictions 367 13.3 Ensembles 373 13.4 Summary 381 PART V INTERVENTION AND USER FEEDBACK CHAPTER 14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 387 14.1 A/B Testing 387 14.2 Uplift (Persuasion) Modeling 393 14.3 Reinforcement Learning 400 14.4 Summary 405 PART VI MINING RELATIONSHIPS AMONG RECORDS CHAPTER 15 Association Rules and Collaborative Filtering 409 15.1 Association Rules 409 15.2 Collaborative Filtering 424 15.3 Summary 438 CHAPTER 16 Cluster Analysis 445 16.1 Introduction 445 16.2 Measuring Distance Between Two Records 449 16.3 Measuring Distance Between Two Clusters 455 16.4 Hierarchical (Agglomerative) Clustering 457 16.5 Non-Hierarchical Clustering: The k-Means Algorithm 466 PART VII FORECASTING TIME SERIES CHAPTER 17 Handling Time Series 479 17.1 Introduction 480 17.2 Descriptive vs. Predictive Modeling 481 17.3 Popular Forecasting Methods in Business 481 17.4 Time Series Components 482 17.5 Data Partitioning and Performance Evaluation 486 CHAPTER 18 Regression-Based Forecasting 497 18.1 A Model with Trend 498 18.2 A Model with Seasonality 504 18.3 A Model with Trend and Seasonality 508 18.4 Autocorrelation and ARIMA Models 509 CHAPTER 19 Smoothing and Deep Learning Methods for Forecasting 533 19.1 Smoothing Methods: Introduction 534 19.2 Moving Average 534 19.3 Simple Exponential Smoothing 541 19.4 Advanced Exponential Smoothing 545 19.5 Deep Learning for Forecasting 549 PART VIII DATA ANALYTICS CHAPTER 20 Social Network Analytics 563 20.1 Introduction 563 20.2 Directed vs. Undirected Networks 564 20.3 Visualizing and Analyzing Networks 567 20.4 Social Data Metrics and Taxonomy 571 20.5 Using Network Metrics in Prediction and Classification 577 20.6 Collecting Social Network Data with RapidMiner 584 20.7 Advantages and Disadvantages 584 CHAPTER 21 Text Mining 589 21.1 Introduction 589 21.2 The Tabular Representation of Text: Term–Document Matrix and “Bag-of-Words’’ 590 21.3 Bag-of-Words vs. Meaning Extraction at Document Level 592 21.4 Preprocessing the Text 593 21.5 Implementing Machine Learning Methods 602 21.6 Example: Online Discussions on Autos and Electronics 602 21.7 Example: Sentiment Analysis of Movie Reviews 607 21.8 Summary 614 CHAPTER 22 Responsible Data Science 617 22.1 Introduction 617 22.2 Unintentional Harm 618 22.3 Legal Considerations 620 22.4 Principles of Responsible Data Science 621 22.5 A Responsible Data Science Framework 624 22.6 Documentation Tools 628 22.7 Example: Applying the RDS Framework to the COMPAS Example 631 22.8 Summary 641 PART IX CASES CHAPTER 23 Cases 647 23.1 Charles Book Club 647 23.2 German Credit 653 23.3 Tayko Software Cataloger 658 23.4 Political Persuasion 662 23.5 Taxi Cancellations 665 23.6 Segmenting Consumers of Bath Soap 667 23.7 Direct-Mail Fundraising 670 23.8 Catalog Cross-Selling 672 23.9 Time Series Case: Forecasting Public Transportation Demand 673 23.10 Loan Approval 675 Index 685
£96.30
John Wiley & Sons Inc Operating AI
Book SynopsisA holistic and real-world approach to operationalizing artificial intelligence in your company InOperating AI, Director of Technology and Architecture at Ericsson AB, Ulrika Jägare, delivers an eye-opening new discussion of how to introduce your organization to artificial intelligence by balancing data engineering, model development, and AI operations. You'll learn the importance of embracing an AI operational mindset to successfully operate AI and lead AI initiatives through the entire lifecycle, including keyareas such as; data mesh, data fabric,aspects ofsecurity,data privacy,data rights and IPR related to data and AI models. In the book, you'll also discover: How to reduce the risk of entering bias in our artificial intelligence solutions and how to approach explainable AI (XAI)The importance of efficient and reproduceable data pipelines, including how to manage your company's dataAn operational perspective on the development of AI models using the MLOps (Machine Learning Operations) approach, including how to deploy, run and monitor models and ML pipelines in production using CI/CD/CT techniques, that generates value in the real worldKey competences and toolsets in AI development, deployment and operationsWhat to consider when operating different types of AI business models With a strong emphasis on deployment and operations of trustworthy and reliable AI solutions that operate well in the real worldand not just the labOperating AIis a must-read for business leaders looking for ways to operationalize an AI business model that actually makes money, from the concept phase to running in a live production environment.Table of ContentsForeword xii Introduction xv Chapter 1 Balancing the AI Investment 1 Defining AI and Related Concepts 3 Operational Readiness and Why It Matters 8 Applying an Operational Mind- set from the Start 12 The Operational Challenge 15 Strategy, People, and Technology Considerations 19 Strategic Success Factors in Operating AI 20 People and Mind- sets 23 The Technology Perspective 28 Chapter 2 Data Engineering Focused on AI 31 Know Your Data 32 Know the Data Structure 32 Know the Data Records 34 Know the Business Data Oddities 35 Know the Data Origin 36 Know the Data Collection Scope 37 The Data Pipeline 38 Types of Data Pipeline Solutions 41 Data Quality in Data Pipelines 44 The Data Quality Approach in AI/ML 45 Scaling Data for AI 49 Key Capabilities for Scaling Data 51 Introducing a Data Mesh 53 When You Have No Data 55 The Role of a Data Fabric 56 Why a Data Fabric Matters in AI/ML 58 Key Competences and Skillsets in Data Engineering 60 Chapter 3 Embracing MLOps 71 MLOps as a Concept 72 From ML Models to ML Pipelines 76 The ML Pipeline 78 Adopt a Continuous Learning Approach 84 The Maturity of Your AI/ML Capability 86 Level 0— Model Focus and No MLOps 88 Level 1— Pipelines Rather than Models 89 Level 2— Leveraging Continuous Learning 90 The Model Training Environment 91 Enabling ML Experimentation 92 Using a Simulator for Model Training 94 Environmental Impact of Training AI Models 96 Considering the AI/ML Functional Technology Stack 97 Key Competences and Toolsets in MLOps 103 Clarifying Similarities and Differences 106 MLOps Toolsets 107 Chapter 4 Deployment with AI Operations in Mind 115 Model Serving in Practice 117 Feature Stores 118 Deploying, Serving, and Inferencing Models at Scale 121 The ML Inference Pipeline 123 Model Serving Architecture Components 125 Considerations Regarding Toolsets for Model Serving 129 The Industrialization of AI 129 The Importance of a Cultural Shift 139 Chapter 5 Operating AI Is Different from Operating Software 143 Model Monitoring 144 Ensuring Efficient ML Model Monitoring 145 Model Scoring in Production 146 Retraining in Production Using Continuous Training 151 Data Aspects Related to Model Retraining 155 Understanding Different Retraining Techniques 156 Deployment after Retraining 159 Disadvantages of Retraining Models Frequently 159 Diagnosing and Managing Model Performance Issues in Operations 161 Issues with Data Processing 162 Issues with Data Schema Change 163 Data Loss at the Source 165 Models Are Broken Upstream 166 Monitoring Data Quality and Integrity 167 Monitoring the Model Calls 167 Monitoring the Data Schema 168 Detecting Any Missing Data 168 Validating the Feature Values 169 Monitor the Feature Processing 170 Model Monitoring for Stakeholders 171 Ensuring Stakeholder Collaboration for Model Success 173 Toolsets for Model Monitoring in Production 175 Chapter 6 AI Is All About Trust 181 Anonymizing Data 182 Data Anonymization Techniques 185 Pros and Cons of Data Anonymization 187 Explainable AI 189 Complex AI Models Are Harder to Understand 190 What Is Interpretability? 191 The Need for Interpretability in Different Phases 192 Reducing Bias in Practice 194 Rights to the Data and AI Models 199 Data Ownership 200 Who Owns What in a Trained AI Model? 202 Balancing the IP Approach for AI Models 205 The Role of AI Model Training 206 Addressing IP Ownership in AI Results 207 Legal Aspects of AI Techniques 208 Operational Governance of Data and AI 210 Chapter 7 Achieving Business Value from AI 215 The Challenge of Leveraging Value from AI 216 Productivity 216 Reliability 217 Risk 218 People 219 Top Management and AI Business Realization 219 Measuring AI Business Value 223 Measuring AI Value in Nonrevenue Terms 227 Operating Different AI Business Models 229 Operating Artificial Intelligence as a Service 230 Operating Embedded AI Solutions 236 Operating a Hybrid AI Business Model 239 Index 241
£24.79
John Wiley & Sons Inc Deep Learning
Book SynopsisDEEP LEARNING A concise and practical exploration of key topics and applications in data science In Deep Learning: From Big Data to Artificial Intelligence with R, expert researcher Dr. Stéphane Tufféry delivers an insightful discussion of the applications of deep learning and big data that focuses on practical instructions on various software tools and deep learning methods relying on three major libraries: MXNet, PyTorch, and Keras-TensorFlow. In the book, numerous, up-to-date examples are combined with key topics relevant to modern data scientists, including processing optimization, neural network applications, natural language processing, and image recognition. This is a thoroughly revised and updated edition of a book originally released in French, with new examples and methods included throughout. Classroom-tested and intuitively organized, Deep Learning: From Big Data to Artificial Intelligence with R offers complimentary accesTable of ContentsAcknowledgements xiii Introduction xv 1 From Big Data to Deep Learning 1 1.1 Introduction 1 1.2 Examples of the Use of Big Data and Deep Learning 6 1.3 Big Data and Deep Learning for Companies and Organizations 9 1.3.1 Big Data in Finance 10 1.3.1.1 Google Trends 10 1.3.1.2 Google Trends and Stock Prices 11 1.3.1.3 The quantmod Package for Financial Analysis 11 1.3.1.4 Google Trends in R 13 1.3.1.5 Matching Data from quantmod and Google Trends 14 1.3.2 Big Data and Deep Learning in Insurance 18 1.3.3 Big Data and Deep Learning in Industry 18 1.3.4 Big Data and Deep Learning in Scientific Research and Education 20 1.3.4.1 Big Data in Physics and Astrophysics 20 1.3.4.2 Big Data in Climatology and Earth Sciences 21 1.3.4.3 Big Data in Education 21 1.4 Big Data and Deep Learning for Individuals 21 1.4.1 Big Data and Deep Learning in Healthcare 21 1.4.1.1 Connected Health and Telemedicine 21 1.4.1.2 Geolocation and Health 22 1.4.1.3 The Google Flu Trends 23 1.4.1.4 Research in Health and Medicine 26 1.4.2 Big Data and Deep Learning for Drivers 28 1.4.3 Big Data and Deep Learning for Citizens 29 1.4.4 Big Data and Deep Learning in the Police 30 1.5 Risks in Data Processing 32 1.5.1 Insufficient Quantity of Training Data 32 1.5.2 Poor Data Quality 32 1.5.3 Non-Representative Samples 33 1.5.4 Missing Values in the Data 33 1.5.5 Spurious Correlations 34 1.5.6 Overfitting 35 1.5.7 Lack of Explainability of Models 35 1.6 Protection of Personal Data 36 1.6.1 The Need for Data Protection 36 1.6.2 Data Anonymization 38 1.6.3 The General Data Protection Regulation 41 1.7 Open Data 43 Notes 44 2 Processing of Large Volumes of Data 49 2.1 Issues 49 2.2 The Search for a Parsimonious Model 50 2.3 Algorithmic Complexity 51 2.4 Parallel Computing 51 2.5 Distributed Computing 52 2.5.1 MapReduce 53 2.5.2 Hadoop 54 2.5.3 Computing Tools for Distributed Computing 55 2.5.4 Column-Oriented Databases 56 2.5.5 Distributed Architecture and “Analytics" 57 2.5.6 Spark 58 2.6 Computer Resources 60 2.6.1 Minimum Resources 60 2.6.2 Graphics Processing Units (GPU) and Tensor Processing Units (TPU) 61 2.6.3 Solutions in the Cloud 62 2.7 R and Python Software 62 2.8 Quantum Computing 67 Notes 68 3 Reminders of Machine Learning 71 3.1 General 71 3.2 The Optimization Algorithms 74 3.3 Complexity Reduction and Penalized Regression 85 3.4 Ensemble Methods 89 3.4.1 Bagging 89 3.4.2 Random Forests 89 3.4.3 Extra-Trees 91 3.4.4 Boosting 92 3.4.5 Gradient Boosting Methods 97 3.4.6 Synthesis of the Ensemble Methods 100 3.5 Support Vector Machines 100 3.6 Recommendation Systems 105 Notes 108 4 Natural Language Processing 111 4.1 From Lexical Statistics to Natural Language Processing 111 4.2 Uses of Text Mining and Natural Language Processing 113 4.3 The Operations of Textual Analysis 114 4.3.1 Textual Data Collection 115 4.3.2 Identification of the Language 115 4.3.3 Tokenization 116 4.3.4 Part-of-Speech Tagging 117 4.3.5 Named Entity Recognition 119 4.3.6 Coreference Resolution 124 4.3.7 Lemmatization 124 4.3.8 Stemming 129 4.3.9 Simplifications 129 4.3.10 Removal of StopWords 130 4.4 Vector Representation andWord Embedding 132 4.4.1 Vector Representation 132 4.4.2 Analysis on the Document-Term Matrix 133 4.4.3 TF-IDF Weighting 142 4.4.4 Latent Semantic Analysis 144 4.4.5 Latent Dirichlet Allocation 152 4.4.6 Word Frequency Analysis 160 4.4.7 Word2Vec Embedding 162 4.4.8 GloVe Embedding 174 4.4.9 FastText Embedding 176 4.5 Sentiment Analysis 180 Notes 184 5 Social Network Analysis 187 5.1 Social Networks 187 5.2 Characteristics of Graphs 188 5.3 Characterization of Social Networks 189 5.4 Measures of Influence in a Graph 190 5.5 Graphs with R 191 5.6 Community Detection 200 5.6.1 The Modularity of a Graph 201 5.6.2 Community Detection by Divisive Hierarchical Clustering 202 5.6.3 Community Detection by Agglomerative Hierarchical Clustering 203 5.6.4 Other Methods 204 5.6.5 Community Detection with R 205 5.7 Research and Analysis on Social Networks 208 5.8 The Business Model of Social Networks 209 5.9 Digital Advertising 211 5.10 Social Network Analysis with R 212 5.10.1 Collecting Tweets 213 5.10.2 Formatting the Corpus 215 5.10.3 Stemming and Lemmatization 216 5.10.4 Example 217 5.10.5 Clustering of Terms and Documents 225 5.10.6 Opinion Scoring 230 5.10.7 Graph of Terms with Their Connotation 231 Notes 234 6 Handwriting Recognition 237 6.1 Data 237 6.2 Issues 238 6.3 Data Processing 238 6.4 Linear and Quadratic Discriminant Analysis 243 6.5 Multinomial Logistic Regression 245 6.6 Random Forests 246 6.7 Extra-Trees 247 6.8 Gradient Boosting 249 6.9 Support Vector Machines 253 6.10 Single Hidden Layer Perceptron 258 6.11 H2O Neural Network 262 6.12 Synthesis of “Classical” Methods 267 Notes 268 7 Deep Learning 269 7.1 The Principles of Deep Learning 269 7.2 Overview of Deep Neural Networks 272 7.3 Recall on Neural Networks and Their Training 274 7.4 Difficulties of Gradient Backpropagation 284 7.5 The Structure of a Convolutional Neural Network 286 7.6 The Convolution Mechanism 288 7.7 The Convolution Parameters 290 7.8 Batch Normalization 292 7.9 Pooling 293 7.10 Dilated Convolution 295 7.11 Dropout and DropConnect 295 7.12 The Architecture of a Convolutional Neural Network 297 7.13 Principles of Deep Network Learning for Computer Vision 299 7.14 Adaptive Learning Algorithms 301 7.15 Progress in Image Recognition 304 7.16 Recurrent Neural Networks 312 7.17 Capsule Networks 317 7.18 Autoencoders 318 7.19 Generative Models 322 7.19.1 Generative Adversarial Networks 323 7.19.2 Variational Autoencoders 324 7.20 Other Applications of Deep Learning 326 7.20.1 Object Detection 326 7.20.2 Autonomous Vehicles 333 7.20.3 Analysis of Brain Activity 334 7.20.4 Analysis of the Style of a PictorialWork 336 7.20.5 Go and Chess Games 338 7.20.6 Other Games 340 Notes 341 8 Deep Learning for Computer Vision 347 8.1 Deep Learning Libraries 347 8.2 MXNet 349 8.2.1 General Information about MXNet 349 8.2.2 Creating a Convolutional Network with MXNet 350 8.2.3 Model Management with MXNet 361 8.2.4 CIFAR-10 Image Recognition with MXNet 362 8.3 Keras and TensorFlow 367 8.3.1 General Information about Keras 370 8.3.2 Application of Keras to the MNIST Database 371 8.3.3 Application of Pre-Trained Models 375 8.3.4 Explain the Prediction of a Computer Vision Model 379 8.3.5 Application of Keras to CIFAR-10 Images 382 8.3.6 Classifying Cats and Dogs 393 8.4 Configuring a Machine’s GPU for Deep Learning 409 8.4.1 Checking the Compatibility of the Graphics Card 410 8.4.2 NVIDIA Driver Installation 410 8.4.3 Installation of Microsoft Visual Studio 411 8.4.4 NVIDIA CUDA To34olkit Installation 411 8.4.5 Installation of cuDNN 412 8.5 Computing in the Cloud 412 8.6 PyTorch 419 8.6.1 The Python PyTorch Package 419 8.6.2 The R torch Package 425 Notes 431 9 Deep Learning for Natural Language Processing 433 9.1 Neural Network Methods for Text Analysis 433 9.2 Text Generation Using a Recurrent Neural Network LSTM 434 9.3 Text Classification Using a LSTM or GRU Recurrent Neural Network 440 9.4 Text Classification Using a H2O Model 452 9.5 Application of Convolutional Neural Networks 456 9.6 Spam Detection Using a Recurrent Neural Network LSTM 460 9.7 Transformer Models, BERT, and Its Successors 461 Notes 479 10 Artificial Intelligence 481 10.1 The Beginnings of Artificial Intelligence 481 10.2 Human Intelligence and Artificial Intelligence 486 10.3 The Different Forms of Artificial Intelligence 488 10.4 Ethical and Societal Issues of Artificial Intelligence 493 10.5 Fears and Hopes of Artificial Intelligence 496 10.6 Some Dates of Artificial Intelligence 499 Notes 502 Conclusion 505 Note 506 Annotated Bibliography 507 On Big Data and High Dimensional Statistics 507 On Deep Learning 509 On Artificial Intelligence 511 On the Use of R and Python in Data Science and on Big Data 512 Index 515
£999.99
John Wiley & Sons Inc Fuzzy Computing in Data Science
Book SynopsisFUZZY COMPUTING IN DATA SCIENCE This book comprehensively explains how to use various fuzzy-based models to solve real-time industrial challenges. The book provides information about fundamental aspects of the field and explores the myriad applications of fuzzy logic techniques and methods. It presents basic conceptual considerations and case studies of applications of fuzzy computation. It covers the fundamental concepts and techniques for system modeling, information processing, intelligent system design, decision analysis, statistical analysis, pattern recognition, automated learning, system control, and identification. The book also discusses the combination of fuzzy computation techniques with other computational intelligence approaches such as neural and evolutionary computation. Audience Researchers and students in computer science, artificial intelligence, machine learning, big data analytics, and information and communication technology.Table of ContentsPreface xvii Acknowledgement xxi 1 Band Reduction of HSI Segmentation Using FCM 1 V. Saravana Kumar, S. Anantha Sivaprakasam, E.R. Naganathan, Sunil Bhutada and M. Kavitha 1.1 Introduction 2 1.2 Existing Method 3 1.2.1 K-Means Clustering Method 3 1.2.2 Fuzzy C-Means 3 1.2.3 Davies Bouldin Index 4 1.2.4 Data Set Description of HSI 4 1.3 Proposed Method 5 1.3.1 Hyperspectral Image Segmentation Using Enhanced Estimation of Centroid 5 1.3.2 Band Reduction Using K-Means Algorithm 6 1.3.3 Band Reduction Using Fuzzy C-Means 7 1.4 Experimental Results 8 1.4.1 DB Index Graph 8 1.4.2 K-Means–Based PSC (EEOC) 9 1.4.3 Fuzzy C-Means–Based PSC (EEOC) 10 1.5 Analysis of Results 12 1.6 Conclusions 16 References 17 2 A Fuzzy Approach to Face Mask Detection 21 Vatsal Mishra, Tavish Awasthi, Subham Kashyap, Minerva Brahma, Monideepa Roy and Sujoy Datta 2.1 Introduction 22 2.2 Existing Work 23 2.3 The Proposed Framework 26 2.4 Set-Up and Libraries Used 26 2.5 Implementation 27 2.6 Results and Analysis 29 2.7 Conclusion and Future Work 33 References 34 3 Application of Fuzzy Logic to the Healthcare Industry 37 Biswajeet Sahu, Lokanath Sarangi, Abhinadita Ghosh and Hemanta Kumar Palo 3.1 Introduction 38 3.2 Background 41 3.3 Fuzzy Logic 42 3.4 Fuzzy Logic in Healthcare 45 3.5 Conclusions 49 References 50 4 A Bibliometric Approach and Systematic Exploration of Global Research Activity on Fuzzy Logic in Scopus Database 55 Sugyanta Priyadarshini and Nisrutha Dulla 4.1 Introduction 56 4.2 Data Extraction and Interpretation 58 4.3 Results and Discussion 59 4.3.1 Per Year Publication and Citation Count 59 4.3.2 Prominent Affiliations Contributing Toward Fuzzy Logic 60 4.3.3 Top Journals Emerging in Fuzzy Logic in Major Subject Areas 61 4.3.4 Major Contributing Countries Toward Fuzzy Research Articles 63 4.3.5 Prominent Authors Contribution Toward the Fuzzy Logic Analysis 66 4.3.6 Coauthorship of Authors 67 4.3.7 Cocitation Analysis of Cited Authors 68 4.3.8 Cooccurrence of Author Keywords 68 4.4 Bibliographic Coupling of Documents, Sources, Authors, and Countries 70 4.4.1 Bibliographic Coupling of Documents 70 4.4.2 Bibliographic Coupling of Sources 71 4.4.3 Bibliographic Coupling of Authors 72 4.4.4 Bibliographic Coupling of Countries 73 4.5 Conclusion 74 References 76 5 Fuzzy Decision Making in Predictive Analytics and Resource Scheduling 79 Rekha A. Kulkarni, Suhas H. Patil and Bithika Bishesh 5.1 Introduction 80 5.2 History of Fuzzy Logic and Its Applications 81 5.3 Approximate Reasoning 82 5.4 Fuzzy Sets vs Classical Sets 83 5.5 Fuzzy Inference System 84 5.5.1 Characteristics of FIS 85 5.5.2 Working of FIS 85 5.5.3 Methods of FIS 86 5.6 Fuzzy Decision Trees 86 5.6.1 Characteristics of Decision Trees 87 5.6.2 Construction of Fuzzy Decision Trees 87 5.7 Fuzzy Logic as Applied to Resource Scheduling in a Cloud Environment 88 5.8 Conclusion 90 References 91 6 Application of Fuzzy Logic and Machine Learning Concept in Sales Data Forecasting Decision Analytics Using ARIMA Model 93 S. Mala and V. Umadevi 6.1 Introduction 94 6.1.1 Aim and Scope 94 6.1.2 R-Tool 94 6.1.3 Application of Fuzzy Logic 94 6.1.4 Dataset 95 6.2 Model Study 96 6.2.1 Introduction to Machine Learning Method 96 6.2.2 Time Series Analysis 96 6.2.3 Components of a Time Series 97 6.2.4 Concepts of Stationary 99 6.2.5 Model Parsimony 100 6.3 Methodology 100 6.3.1 Exploratory Data Analysis 100 6.3.1.1 Seed Types—Analysis 101 6.3.1.2 Comparison of Location and Seeds 101 6.3.1.3 Comparison of Season (Month) and Seeds 103 6.3.2 Forecasting 103 6.3.2.1 Auto Regressive Integrated Moving Average (ARIMA) 103 6.3.2.2 Data Visualization 106 6.3.2.3 Implementation Model 108 6.4 Result Analysis 108 6.5 Conclusion 110 References 110 7 Modified m-Polar Fuzzy Set ELECTRE-I Approach 113 Madan Jagtap, Prasad Karande and Pravin Patil 7.1 Introduction 114 7.1.1 Objectives 114 7.2 Implementation of m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculations 115 7.2.1 The m-Polar Fuzzy ELECTRE-I Integrated Shannon’s Entropy Weight Calculation Method 115 7.3 Application to Industrial Problems 118 7.3.1 Cutting Fluid Selection Problem 118 7.3.2 Results Obtained From m-Polar Fuzzy ELECTRE-I for Cutting Fluid Selection Problem 122 7.3.3 FMS Selection Problem 125 7.3.4 Results Obtained From m-Polar Fuzzy ELECTRE-I for FMS Selection 130 7.4 Conclusions 143 References 143 8 Fuzzy Decision Making: Concept and Models 147 Bithika Bishesh 8.1 Introduction 148 8.2 Classical Set 149 8.3 Fuzzy Set 150 8.4 Properties of Fuzzy Set 151 8.5 Types of Decision Making 153 8.5.1 Individual Decision Making 153 8.5.2 Multiperson Decision Making 157 8.5.3 Multistage Decision Making 158 8.5.4 Multicriteria Decision Making 160 8.6 Methods of Multiattribute Decision Making (MADM) 162 8.6.1 Weighted Sum Method (WSM) 162 8.6.2 Weighted Product Method (WPM) 162 8.6.3 Weighted Aggregates Sum Product Assessment (WASPAS) 163 8.6.4 Technique for Order Preference by Similarity to Ideal Solutions (TOPSIS) 166 8.7 Applications of Fuzzy Logic 167 8.8 Conclusion 169 References 169 9 Use of Fuzzy Logic for Psychological Support to Migrant Workers of Southern Odisha (India) 173 Sanjaya Kumar Sahoo and Sukanta Chandra Swain 9.1 Introduction 174 9.2 Objectives and Methodology 175 9.2.1 Objectives 175 9.2.2 Methodology 176 9.3 Effect of COVID-19 on the Psychology and Emotion of Repatriated Migrants 176 9.3.1 Psychological Variables Identified 176 9.3.2 Fuzzy Logic for Solace to Migrants 176 9.4 Findings 178 9.5 Way Out for Strengthening the Psychological Strength of the Migrant Workers through Technological Aid 178 9.6 Conclusion 179 References 180 10 Fuzzy-Based Edge AI Approach: Smart Transformation of Healthcare for a Better Tomorrow 181 B. RaviKrishna, Sirisha Potluri, J. Rethna Virgil Jeny, Guna Sekhar Sajja and Katta Subba Rao 10.1 Significance of Machine Learning in Healthcare 182 10.2 Cloud-Based Artificial Intelligent Secure Models 183 10.3 Applications and Usage of Machine Learning in Healthcare 183 10.3.1 Detecting Diseases and Diagnosis 183 10.3.2 Drug Detection and Manufacturing 183 10.3.3 Medical Imaging Analysis and Diagnosis 184 10.3.4 Personalized/Adapted Medicine 185 10.3.5 Behavioral Modification 185 10.3.6 Maintenance of Smart Health Data 185 10.3.7 Clinical Trial and Study 185 10.3.8 Crowdsourced Information Discovery 185 10.3.9 Enhanced Radiotherapy 186 10.3.10 Outbreak/Epidemic Prediction 186 10.4 Edge AI: For Smart Transformation of Healthcare 186 10.4.1 Role of Edge in Reshaping Healthcare 186 10.4.2 How AI Powers the Edge 187 10.5 Edge AI-Modernizing Human Machine Interface 188 10.5.1 Rural Medicine 188 10.5.2 Autonomous Monitoring of Hospital Rooms—A Case Study 188 10.6 Significance of Fuzzy in Healthcare 189 10.6.1 Fuzzy Logic—Outline 189 10.6.2 Fuzzy Logic-Based Smart Healthcare 190 10.6.3 Medical Diagnosis Using Fuzzy Logic for Decision Support Systems 191 10.6.4 Applications of Fuzzy Logic in Healthcare 193 10.7 Conclusion and Discussions 193 References 194 11 Video Conferencing (VC) Software Selection Using Fuzzy TOPSIS 197 Rekha Gupta 11.1 Introduction 197 11.2 Video Conferencing Software and Its Major Features 199 11.2.1 Video Conferencing/Meeting Software (VC/MS) for Higher Education Institutes 199 11.3 Fuzzy TOPSIS 203 11.3.1 Extension of TOPSIS Algorithm: Fuzzy TOPSIS 203 11.4 Sample Numerical Illustration 207 11.5 Conclusions 213 References 213 12 Estimation of Nonperforming Assets of Indian Commercial Banks Using Fuzzy AHP and Goal Programming 215 Kandarp Vidyasagar and Rajiv Kr. Dwivedi 12.1 Introduction 216 12.1.1 Basic Concepts of Fuzzy AHP and Goal Programming 217 12.2 Research Model 221 12.2.1 Average Growth Rate Calculation 227 12.3 Result and Discussion 233 12.4 Conclusion 234 References 234 13 Evaluation of Ergonomic Design for the Visual Display Terminal Operator at Static Work Under FMCDM Environment 237 Bipradas Bairagi 13.1 Introduction 238 13.2 Proposed Algorithm 240 13.3 An Illustrative Example on Ergonomic Design Evaluation 245 13.4 Conclusions 249 References 249 14 Optimization of Energy Generated from Ocean Wave Energy Using Fuzzy Logic 253 S. B. Goyal, Pradeep Bedi, Jugnesh Kumar and Prasenjit Chatterjee 14.1 Introduction 254 14.2 Control Approach in Wave Energy Systems 255 14.3 Related Work 257 14.4 Mathematical Modeling for Energy Conversion from Ocean Waves 259 14.5 Proposed Methodology 260 14.5.1 Wave Parameters 261 14.5.2 Fuzzy-Optimizer 262 14.6 Conclusion 264 References 264 15 The m-Polar Fuzzy TOPSIS Method for NTM Selection 267 Madan Jagtap and Prasad Karande 15.1 Introduction 268 15.2 Literature Review 268 15.3 Methodology 270 15.3.1 Steps of the mFS TOPSIS 270 15.4 Case Study 272 15.4.1 Effect of Analytical Hierarchy Process (AHP) Weight Calculation on the mFS TOPSIS Method 273 15.4.2 Effect of Shannon’s Entropy Weight Calculation on the m-Polar Fuzzy Set TOPSIS Method 277 15.5 Results and Discussions 281 15.5.1 Result Validation 281 15.6 Conclusions and Future Scope 283 References 284 16 Comparative Analysis on Material Handling Device Selection Using Hybrid FMCDM Methodology 287 Bipradas Bairagi 16.1 Introduction 288 16.2 MCDM Techniques 289 16.2.1 Fahp 289 16.2.2 Entropy Method as Weights (Influence) Evaluation Technique 290 16.3 The Proposed Hybrid and Super Hybrid FMCDM Approaches 291 16.3.1 Topsis 291 16.3.2 FMOORA Method 292 16.3.3 FVIKOR 292 16.3.4 Fuzzy Grey Theory (FGT) 293 16.3.5 COPRAS –G 293 16.3.6 Super Hybrid Algorithm 294 16.4 Illustrative Example 295 16.5 Results and Discussions 298 16.5.1 FTOPSIS 298 16.5.2 FMOORA 298 16.5.3 FVIKRA 298 16.5.4 Fuzzy Grey Theory (FGT) 299 16.5.5 COPRAS-G 299 16.5.6 Super Hybrid Approach (SHA) 299 16.6 Conclusions 302 References 302 17 Fuzzy MCDM on CCPM for Decision Making: A Case Study 305 Bimal K. Jena, Biswajit Das, Amarendra Baral and Sushanta Tripathy 17.1 Introduction 306 17.2 Literature Review 307 17.3 Objective of Research 308 17.4 Cluster Analysis 308 17.4.1 Hierarchical Clustering 309 17.4.2 Partitional Clustering 309 17.5 Clustering 310 17.6 Methodology 314 17.7 TOPSIS Method 316 17.8 Fuzzy TOPSIS Method 318 17.9 Conclusion 325 17.10 Scope of Future Study 326 References 326 Index 329
£133.20
John Wiley & Sons Inc A Practical Guide to Data Mining for Business and
Book Synopsis* Presents data mining processes, methods and commonly used methods for descriptive and exploratory statistics using SAS and JMP.Trade Review“A Practical Guide to Data Mining for Business and Industrygives practical tools on how information can be extracted from masses of data. The book is very well written, in a conversational tone that makes it enjoyable to read. The authors are excellent communicators. If you are interested in learning about data mining, learning to do a particular task in data mining, looking for a textbook to use in a data mining or analytics course, or have a problem or data analytic task you are working on, this book would be an excellent place to start.” (Mathematical Association of America, 23 August 2014)Table of ContentsGlossary of terms xii Part I Data Mining Concept 1 1 Introduction 3 1.1 Aims of the Book 3 1.2 Data Mining Context 5 1.2.1 Domain Knowledge 6 1.2.2 Words to Remember 7 1.2.3 Associated Concepts 7 1.3 Global Appeal 8 1.4 Example Datasets Used in This Book 8 1.5 Recipe Structure 11 1.6 Further Reading and Resources 13 2 Data Mining Definition 14 2.1 Types of Data Mining Questions 15 2.1.1 Population and Sample 15 2.1.2 Data Preparation 16 2.1.3 Supervised and Unsupervised Methods 16 2.1.4 Knowledge-Discovery Techniques 18 2.2 Data Mining Process 19 2.3 Business Task: Clarification of the Business Question behind the Problem 20 2.4 Data: Provision and Processing of the Required Data 21 2.4.1 Fixing the Analysis Period 22 2.4.2 Basic Unit of Interest 23 2.4.3 Target Variables 24 2.4.4 Input Variables/Explanatory Variables 24 2.5 Modelling: Analysis of the Data 25 2.6 Evaluation and Validation during the Analysis Stage 25 2.7 Application of Data Mining Results and Learning from the Experience 28 Part II Data Mining Practicalities 31 3 All about data 33 3.1 Some Basics 34 3.1.1 Data, Information, Knowledge and Wisdom 35 3.1.2 Sources and Quality of Data 36 3.1.3 Measurement Level and Types of Data 37 3.1.4 Measures of Magnitude and Dispersion 39 3.1.5 Data Distributions 41 3.2 Data Partition: Random Samples for Training, Testing and Validation 41 3.3 Types of Business Information Systems 44 3.3.1 Operational Systems Supporting Business Processes 44 3.3.2 Analysis-Based Information Systems 45 3.3.3 Importance of Information 45 3.4 Data Warehouses 47 3.4.1 Topic Orientation 47 3.4.2 Logical Integration and Homogenisation 48 3.4.3 Reference Period 48 3.4.4 Low Volatility 48 3.4.5 Using the Data Warehouse 49 3.5 Three Components of a Data Warehouse: DBMS, DB and DBCS 50 3.5.1 Database Management System (DBMS) 51 3.5.2 Database (DB) 51 3.5.3 Database Communication Systems (DBCS) 51 3.6 Data Marts 52 3.6.1 Regularly Filled Data Marts 53 3.6.2 Comparison between Data Marts and Data Warehouses 53 3.7 A Typical Example from the Online Marketing Area 54 3.8 Unique Data Marts 54 3.8.1 Permanent Data Marts 54 3.8.2 Data Marts Resulting from Complex Analysis 56 3.9 Data Mart: Do’s and Don’ts 58 3.9.1 Do’s and Don’ts for Processes 58 3.9.2 Do’s and Don’ts for Handling 58 3.9.3 Do’s and Don’ts for Coding/Programming 59 4 Data Preparation 60 4.1 Necessity of Data Preparation 61 4.2 From Small and Long to Short and Wide 61 4.3 Transformation of Variables 65 4.4 Missing Data and Imputation Strategies 66 4.5 Outliers 69 4.6 Dealing with the Vagaries of Data 70 4.6.1 Distributions 70 4.6.2 Tests for Normality 70 4.6.3 Data with Totally Different Scales 70 4.7 Adjusting the Data Distributions 71 4.7.1 Standardisation and Normalisation 71 4.7.2 Ranking 71 4.7.3 Box–Cox Transformation 71 4.8 Binning 72 4.8.1 Bucket Method 73 4.8.2 Analytical Binning for Nominal Variables 73 4.8.3 Quantiles 73 4.8.4 Binning in Practice 74 4.9 Timing Considerations 77 4.10 Operational Issues 77 5 Analytics 78 5.1 Introduction 79 5.2 Basis of Statistical Tests 80 5.2.1 Hypothesis Tests and P Values 80 5.2.2 Tolerance Intervals 82 5.2.3 Standard Errors and Confidence Intervals 83 5.3 Sampling 83 5.3.1 Methods 83 5.3.2 Sample Sizes 84 5.3.3 Sample Quality and Stability 84 5.4 Basic Statistics for Pre-analytics 85 5.4.1 Frequencies 85 5.4.2 Comparative Tests 88 5.4.3 Cross Tabulation and Contingency Tables 89 5.4.4 Correlations 90 5.4.5 Association Measures for Nominal Variables 91 5.4.6 Examples of Output from Comparative and Cross Tabulation Tests 92 5.5 Feature Selection/Reduction of Variables 96 5.5.1 Feature Reduction Using Domain Knowledge 96 5.5.2 Feature Selection Using Chi-Square 97 5.5.3 Principal Components Analysis and Factor Analysis 97 5.5.4 Canonical Correlation, PLS and SEM 98 5.5.5 Decision Trees 98 5.5.6 Random Forests 98 5.6 Time Series Analysis 99 6 Methods 102 6.1 Methods Overview 104 6.2 Supervised Learning 105 6.2.1 Introduction and Process Steps 105 6.2.2 Business Task 105 6.2.3 Provision and Processing of the Required Data 106 6.2.4 Analysis of the Data 107 6.2.5 Evaluation and Validation of the Results (during the Analysis) 108 6.2.6 Application of the Results 108 6.3 Multiple Linear Regression for use when Target is Continuous 109 6.3.1 Rationale of Multiple Linear Regression Modelling 109 6.3.2 Regression Coefficients 110 6.3.3 Assessment of the Quality of the Model 111 6.3.4 Example of Linear Regression in Practice 113 6.4 Regression when the Target is not Continuous 119 6.4.1 Logistic Regression 119 6.4.2 Example of Logistic Regression in Practice 121 6.4.3 Discriminant Analysis 126 6.4.4 Log-Linear Models and Poisson Regression 128 6.5 Decision Trees 129 6.5.1 Overview 129 6.5.2 Selection Procedures of the Relevant Input Variables 134 6.5.3 Splitting Criteria 134 6.5.4 Number of Splits (Branches of the Tree) 135 6.5.5 Symmetry/Asymmetry 135 6.5.6 Pruning 135 6.6 Neural Networks 137 6.7 Which Method Produces the Best Model? A Comparison of Regression, Decision Trees and Neural Networks 141 6.8 Unsupervised Learning 142 6.8.1 Introduction and Process Steps 142 6.8.2 Business Task 143 6.8.3 Provision and Processing of the Required Data 143 6.8.4 Analysis of the Data 145 6.8.5 Evaluation and Validation of the Results (during the Analysis) 147 6.8.6 Application of the Results 148 6.9 Cluster Analysis 148 6.9.1 Introduction 148 6.9.2 Hierarchical Cluster Analysis 149 6.9.3 K-Means Method of Cluster Analysis 150 6.9.4 Example of Cluster Analysis in Practice 151 6.10 Kohonen Networks and Self-Organising Maps 151 6.10.1 Description 151 6.10.2 Example of SOMs in Practice 152 6.11 Group Purchase Methods: Association and Sequence Analysis 155 6.11.1 Introduction 155 6.11.2 Analysis of the Data 157 6.11.3 Group Purchase Methods 158 6.11.4 Examples of Group Purchase Methods in Practice 158 7 Validation and Application 161 7.1 Introduction to Methods for Validation 161 7.2 Lift and Gain Charts 162 7.3 Model Stability 164 7.4 Sensitivity Analysis 167 7.5 Threshold Analytics and Confusion Matrix 169 7.6 ROC Curves 170 7.7 Cross-Validation and Robustness 171 7.8 Model Complexity 172 Part III Data Mining in Action 173 8 Marketing: Prediction 175 8.1 Recipe 1: Response Optimisation: to Find and Address the Right Number of Customers 176 8.2 Recipe 2: To Find the x% of Customers with the Highest Affinity to an Offer 186 8.3 Recipe 3: To Find the Right Number of Customers to Ignore 187 8.4 Recipe 4: To Find the x% of Customers with the Lowest Affinity to an Offer 190 8.5 Recipe 5: To Find the x% of Customers with the Highest Affinity to Buy 191 8.6 Recipe 6: To Find the x% of Customers with the Lowest Affinity to Buy 192 8.7 Recipe 7: To Find the x% of Customers with the Highest Affinity to a Single Purchase 193 8.8 Recipe 8: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Communication Areas 194 8.9 Recipe 9: To Find the x% of Customers with the Highest Affinity to Sign a Long-Term Contract in Insurance Areas 196 9 Intra-Customer Analysis 198 9.1 Recipe 10: To Find the Optimal Amount of Single Communication to Activate One Customer 199 9.2 Recipe 11: To Find the Optimal Communication Mix to Activate One Customer 200 9.3 Recipe 12: To Find and Describe Homogeneous Groups of Products 206 9.4 Recipe 13: To Find and Describe Groups of Customers with Homogeneous Usage 210 9.5 Recipe 14: To Predict the Order Size of Single Products or Product Groups 216 9.6 Recipe 15: Product Set Combination 217 9.7 Recipe 16: To Predict the Future Customer Lifetime Value of a Customer 219 10 Learning from a Small Testing Sample and Prediction 225 10.1 Recipe 17: To Predict Demographic Signs (Like Sex, Age, Education and Income) 225 10.2 Recipe 18: To Predict the Potential Customers of a Brand New Product or Service in Your Databases 236 10.3 Recipe 19: To Understand Operational Features and General Business Forecasting 241 11 Miscellaneous 244 11.1 Recipe 20: To Find Customers Who Will Potentially Churn 244 11.2 Recipe 21: Indirect Churn Based on a Discontinued Contract 249 11.3 Recipe 22: Social Media Target Group Descriptions 250 11.4 Recipe 23: Web Monitoring 254 11.5 Recipe 24: To Predict Who is Likely to Click on a Special Banner 258 12 Software and Tools: A Quick Guide 261 12.1 List of Requirements When Choosing a Data Mining Tool 261 12.2 Introduction to the Idea of Fully Automated Modelling (FAM) 265 12.2.1 Predictive Behavioural Targeting 265 12.2.2 Fully Automatic Predictive Targeting and Modelling Real-Time Online Behaviour 266 12.3 FAM Function 266 12.4 FAM Architecture 267 12.5 FAM Data Flows and Databases 268 12.6 FAM Modelling Aspects 269 12.7 FAM Challenges and Critical Success Factors 270 12.8 FAM Summary 270 13 Overviews 271 13.1 To Make Use of Official Statistics 272 13.2 How to Use Simple Maths to Make an Impression 272 13.2.1 Approximations 272 13.2.2 Absolute and Relative Values 273 13.2.3 % Change 273 13.2.4 Values in Context 273 13.2.5 Confidence Intervals 274 13.2.6 Rounding 274 13.2.7 Tables 274 13.2.8 Figures 274 13.3 Differences between Statistical Analysis and Data Mining 275 13.3.1 Assumptions 275 13.3.2 Values Missing Because ‘Nothing Happened’ 275 13.3.3 Sample Sizes 276 13.3.4 Goodness-of-Fit Tests 276 13.3.5 Model Complexity 277 13.4 How to Use Data Mining in Different Industries 277 13.5 Future Views 283 Bibliography 285 Index 296
£999.99
Johns Hopkins University Press Big Data on Campus
Book SynopsisHow data-informed decision making can make colleges and universities more effective institutions. The continuing importance of data analytics is not lost on higher education leaders, who face a multitude of challenges, including increasing operating costs, dwindling state support, limits to tuition increases, and increased competition from the for-profit sector. To navigate these challenges, savvy leaders must leverage data to make sound decisions. In Big Data on Campus, leading data analytics experts and higher ed leaders show the role that analytics can play in the better administration of colleges and universities. Aimed at senior administrative leaders, practitioners of institutional research, technology professionals, and graduate students in higher education, the book opens with a conceptual discussion of the roles that data analytics can play in higher education administration. Subsequent chapters address recent developments in technology, the rapid accumulation of data assetsTable of ContentsForeword, by Christine M. KellerAcknowledgments Part I. Technology, Digitization, Big Data, and Analytics Maturity as the Enabling Conditions for Data-Informed Decision MakingChapter 1. Data Analytics and the Imperatives for Data-Informed Decision Making in Higher Education Karen L. Webber and Henry Y. ZhengChapter 2. Big Data and the Transformation of Decision Making in Higher Education Braden J. HoschChapter 3. Predictive Analytics and Its Uses in Higher Education Henry Y. Zheng and Ying ZhouPart II. The Ethical, Cultural, and Managerial Imperatives of Data-Informed Decision Making in Higher EducationChapter 4. Limitations in Data Analytics: Potential Misuse and Misunderstanding in Data Reports and Visualizations Karen L. Webber and Jillian N. MornChapter 5. Guiding Your Organization's Data Strategy: The Roles of University Senior Leaders and Trustees in Strategic Analytics Gail B. Marsh and Rachit TharianiChapter 6. Data Governance, Data Stewardship, and the Building of an AnalyticsOrganizational Culture Rana Glasgal and Valentina NestorPart III. The Application of Analytics in Higher Education Decision Making: Case StudiesChapter 7. Data Analytics and Decision Making in Admissions and Enrollment Management Tom Gutman and Brian P. HinoteChapter 8. Predictive Analytics, Academic Advising, Early Alerts, and Student Success Timothy M. RenickChapter 9. Constituent Relationship Management and Student Engagement Lifecycle Cathy A. O'Bryan, Chris Tompkins, and Carrie Hancock MarcinkevageChapter 10. Learning Analytics for Learning Assessment: Complexities in Efficacy, Implementation, and Broad Use Carrie Klein, Jaime Lester, Huzefa Rangwala, and Aditya JohriChapter 11. Using Data Analytics to Support Institutional Financial and Operational Efficiency Lindsay K. Wayt, Susan M. Menditto, J. Michael Gower, and Charles TegenPart IV. Concluding CommentsChapter 12. Data-Informed Decision Making and the Pursuit of Analytics Maturity in Higher Education Karen L. Webber and Henry Y. ZhengContributorsIndex
£33.25
Johns Hopkins University Press How Colleges Use Data
Book SynopsisWhat does a culture of evidence really look like in higher education?The use of big data and the rapid acceleration of storage and analytics tools have led to a revolution of data use in higher education. Institutions have moved from relying largely on historical trends and descriptive data to the more widespread adoption of predictive and prescriptive analytics. Despite this rapid evolution of data technology and analytics tools, universities and colleges still face a number of obstacles in their data use. In How Colleges Use Data, Jonathan S. Gagliardi presents college and university leaders with an important resource to help cultivate, implement, and sustain a culture of evidence through the ethical and responsible use and adoption of data and analytics. Gagliardi provides a broad context for data use among colleges, including key concepts and use cases related to data and analytics. He also addresses the different dimensions of data use and highlights the promise and perils of the Table of ContentsPrefaceAcknowledgmentsChapter 1. The Evidence ImperativeChapter 2. Demystifying Data and AnalyticsChapter 3. Defining an Institutional Aspiration Using DataChapter 4. Equity and Student SuccessChapter 5. Strategic Finance and Resource OptimizationChapter 6. Academic Quality and RenewalChapter 7. Creating a Data Governance SystemChapter 8. The Promise and Peril of Data and AnalyticsChapter 9. Implementation and PlanningChapter 10. Looking AheadNotesIndex
£21.60
Johns Hopkins University Press Because Data Cant Speak for Itself
Book Synopsis
£18.05
O'Reilly Media 21 Recipes for Mining Twitter
Book SynopsisMillions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools.
£19.19
O'Reilly Media Spring Data The Definitive Guide
Book SynopsisRelational database technologies continue to be predominant in Java enterprise applications, but with newer technologies such as NoSQL databases and Hadoop available, RDBMS is no longer considered a "one size fits all" solution. This book shows you how to increase your options with Spring's data access framework.
£25.59