Data mining Books
Springer International Publishing AG Data Preprocessing in Data Mining
Book SynopsisData Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data.This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying the techniques proposed in the specialized literature, is given.Each chapter is a stand-alone guide to a particular data preprocessing topic, from basic concepts and detailed descriptions of classical algorithms, to an incursion of an exhaustive catalog of recent developments. The in-depth technical descriptions make this book suitable for technical professionals, researchers, senior undergraduate and graduate students in data science, computer science and engineering.Trade ReviewFrom the book reviews:“This book is a comprehensive collection of data preprocessing techniques used in data mining. Any readers who practice data mining will find it beneficial … . This book is an excellent guideline in the topic of data preprocessing for data mining. It is suitable for both practitioners and researchers who would like to use datasets in their data mining projects.” (Xiannong Meng, Computing Reviews, December, 2014)Table of ContentsIntroduction.- Data Sets and Proper Statistical Analysis of Data Mining Techniques.- Data Preparation Basic Models.- Dealing with Missing Values.- Dealing with Noisy Data.- Data Reduction.- Feature Selection.- Instance Selection.- Discretization.- A Data Mining Software Package Including Data Preparation and Reduction: KEEL.
£151.99
Springer-Verlag Berlin and Heidelberg GmbH & Co. KG Data Matching: Concepts and Techniques for Record
Book SynopsisData matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases.Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.Trade Review"The book is very well organized and exceptionally well written. Because of the depth, amount, and quality of the material that is covered, I would expect this book to be one of the standard references in future years." William E. Winkler, U.S. Bureau of the Census, Washington, DC, USATable of ContentsPart I Overview.- Introduction.- The Data Matching Process.- Part II Steps of the Data Matching Process.- Data Pre-Processing.- Indexing.- Field and Record Comparison.- Classification.- Evaluation of Matching Quality and Complexity.- Part III Further Topics.- Privacy Aspects of Data Matching.- Further Topics and Research Directions.- Data Matching Systems.
£113.99
Mindspeaking People Skills for Analytical Thinkers
£14.99
Helsinki University Press Digital Histories: Emergent Approaches within the New Digital History
£36.38
Springer Computational Structural Behavior
£113.99
Springer Knowledge and Systems Sciences
Book Synopsis.- Complex Networks and Modeling ..- The Temporal Structural Pattern in Scientific Collaborative Behavior from the Perspective of Complex Network..- The Analysis of Innovation Network in China's Hydrogen Energy Industry from the Perspective of Patents..- Construction and Evaluation of a Subjects Synergistic Network for Cross-regional Major Infectious Disease Emergency Response..- Evolution of cumulative reciprocity in structured populations..- Opinion Dynamics. .- Simulating Social Network Dynamics with LLM Agents: An Analysis of Information Propagation and Echo Chambers..- How Public Opinion Risks in Social Hot Events Are Generated: A fsQCA Perspective..- Vulnerability Measurement of Social Media Users to Online Public Opinion in Emergency Context..- Analyzing Replies and Interactions among Users with Different Stances: A Case Study of the Russia Ukraine Conflict..- Explainable Machine Learning-based Research on Key Factors in the Formation of Public Opinion on Similar Events..- Knowledge Technologies and Systems Engineering..- UMCap: User Memory Augmented Method for Personalized Image Descriptions..- Research on the Construction Method of Island Chain Knowledge Graph..- A Knowledge and Data Driven Method for Air Combat Intention Recognition..- Model-Based Systems Engineering Supporting Architecture Modeling of Air Traffic Management System and Model Verifying Based on SMT..- Mission Modeling for the Perseverance Rover based on KARMA Language..- Social Media oriented Fake News Detection based on Social Context and Cascade Graph..- Data Augmentation Using Large Language Model for Fake Review Identification..- Knowledge Management..- Digital Transformation Mechanisms for Emergency Management in Chemical Enterprises: An Industrial Agglomeration Perspective..- Research on the Credibility Evaluation Method of Online Medical Community Answer Content Based on Domain Knowledge Graph..- A Comprehensive Framework for Sentiment Analysis and Cold-Start Recommendations in Vietnam Hospitality Sector..- Mining Complementary Relationships of Items for Diversified Recommendation..- A Novel Two-Stage Approach for Customer Satisfaction Analysis..- Predicting the decision-making performance based on self-attention and long-short term memory network..- A Domain Knowledge-based Railway Equipment Fault Diagnosis Framework.
£64.99
£64.99
£64.99
£64.99
£64.99
£59.99
£59.99
Springer Database Systems for Advanced Applications. DASFAA 2024 International Workshops
Book Synopsis.- BDMS..- Analysis of Kinematics and Dynamics for Mobile Robots under Nonholonomic Constraints..- Exploring the application of nonlinear partial differential equations in computer numerical simulation operating systems..- FAITH: A Fast, Accurate, and Lightweight Database-agnostic Learned Cost Model..- Fast Approximate Temporal Butterfly Counting on Bipartite Graphs via Edge Sampling..- Financial-ICS: Identifying Peer Firms via LongBERT from 10K Reports..- BDQM..- Establishing a Decentralized Diamond Quality Management System: Advancing Towards Global Standardization..- Co-Estimation of Data Types and Their Positional Distribution..- Enhancing Load Forecasting with VAE-GAN-based Data Cleaning for Electric Vehicle Charging Loads..- ERDSE..- Audio-guided Visual Knowledge Representation..- Boundary Point Detection Combining Gravity and Outlier Detection Methods..- A Meta-Learning Approach for Category-aware Sequential Recommendation on POIs..- Automatic Post-Editing of Speech Recognition System Output Using Large Language Models..- Comparative Analysis with Multiple Large-Scale Language Models for Automatic Generation of Funny Dialogues..- Effectiveness of the Programmed Visual Contents Comparison Method for Two Phase Collaborative Learning in Computer Programming Education: A Case Study..- Generating Achievement Relationship Graph between Actions for Alternative Solution Recommendation..- Generating News Headline Containing Specific Person Name..- Investigating Evidence in Sentence Similarity using MASK in BERT..- Acceleration of Synopsis Construction for Bounded Approximate Query Processing..- Query Expansion in Food Review Search with Synonymous Phrase Generation by LLM..- Question Answer Summary Generation from Unstructured Texts by Using LLMs..- Real Estate Information Exploration in VR with LoD Control by Physical Distance..- Voices of Asynchronous Learning Students: Revealing Learning Characteristics through Vocabulary Analysis of Notes Tagged in Videos..- Review Search Interface Based on Search Result Summarization Using Large Language Model..- Yes-No Flowchart Generation for Interactive Exploration of Personalized Health Improvement Actions..- GDMA..- Enhancing Link Prediction Based on Simple Path Graphs..- Construction of EMU fault knowledge graph based on large language model.
£59.99
Springer Neural Information Processing
Book SynopsisLoTraNet: Locality-guided Transformer Network for Image Manipulation Localization.- Progressive EMD-based Trajectory Prediction: A Multistage Approach for Enhanced Human Trajectory Forecasting.- Dual-Level Contrastive Learning Framework.- DLAFormer: A Novel Approach to Image Super-Resolution with Comprehensive Attention Mechanisms.- Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics.- CoMISI: Multimodal Speaker Identification in Diverse Audio-Visual Conditions through Cross-Modal Interaction.- Multi-scale Spatial Feature Aggregation For Effcient Super Resolution.- SCANet: Split Coordinate Attention Network for Building Footprint Extraction.- XFusion: Cross-Attention Transformer for Multi-Focus Image Fusion.- Guided DiffusionDet: Guided Diffusion Model for Object Detection with Resample Mechanism.- Mutual Information-based Mixed Precision Quantization.- MLLM-Driven Semantic Enhancement and Alignment for Text-Based Person Search.- TFCM: Tuning-Free Facial Concept-Erasure in Text-to-Image Models through Attention and Sample Modulation.- Selecting the Best Sequential Transfer Path for Medical Image Segmentation with Limited Labeled Data.- Knowledge Distillation with Differentiable Optimal Transport on Graph Neural Networks.- Test-Time Intensity Consistency Adaptation for Shadow Detection.- Learning from Noisy Labels for Long-tailed Data via Optimal Transport.- LCRPS: Large-Capacity Residual Plane Steganography Based on Multiple Adversarial Networks.- Aesthetics-Guided Multi-scale Feature Fusion for Style Transfer.- BEVRoad: A Cross-Modal and Temporary-Recurrent 3D Object Detector for Infrastructure Perception.- Dilated Pyramid Attention in Hierarchical Vision Transformer for Texture Recognition.- Attention-based Domain Adaptive YOLO For Cross-domain Object Detection.- In-WSOD: Integrality Weakly Supervised Object Detection with Classification and Localization Consistency.- GLEGNet: Infrared and Visible Image Fusion Via Global-Local Feature Extraction and Edge-Gradient Preservation.- Mending of Spatio-Temporal Dependencies in Block Adjacency Matrix.- CaDT-Net: A Cascaded Deformable Transformer Network for Multiclass Breast Cancer Histopathological Image Classification.- DIFA: Deformable Implicit Feature Alignment for Roadside Cooperative Perception.- Transferring Teacher’s Invariance to Student Through Data Augmentation Optimization.- AARR-Net: An Attention Assistance Feature Fusion and Model Recursive Recovery Network for Category-level 6D Object Pose Estimation.- BRS-YOLO: A Balanced Optical Remote Sensing Object Detection Method.- HDKI: A Hierarchical Deep Koopman Framework for Spatio-Temporal Prediction with Image Observations.
£64.99
Springer Neural Information Processing
£64.99
Springer Neural Information Processing
£64.99
Springer Neural Information Processing
£59.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Nature Switzerland AG Neural Information Processing
£75.99
Springer Advances in Data Clustering
Book Synopsis Chapter 1 Classification of Gougerot-Sjögren syndrome Based on Artificial Intelligence.- Chapter 2 Deep learning Classification of Venous Thromboembolism based on Ultrasound imaging.- Chapter 3 Synchronization-Driven Community Detection: Dynamic Frequency Tuning Approach.- Chapter 4 Automatic Evolutionary Clustering for Human Activity Discovery.- Chapter 5 Identification of Correlated factors for Absenteeism of employees using Clustering techniques.- Chapter 6 Multi-view Data Clustering through Consensus Graph and Data Representation Learning.- Chapter 7 Uber's Contribution to Faster Deep Learning: A Case Study in Distributed Model Training.- Chapter 8 Auto-Weighted Multi-View Clustering with Unified Binary Representation and Deep Initialization.- Chapter 9 Clustering with Adaptive Unsupervised Graph Convolution Network.- Chapter 10 Graph-based Semi-supervised Learning for Multi-view Data Analysis.- Chapter 11 Advancements in Fuzzy Clustering Algorithms for Im-age Processing: A Comprehensive Review and Future Directions.- Chapter 12 Multiview Latent representation learning with feature diversity for clustering.
£142.49
Springer Data Science
Book Synopsis.- Novel methods or tools used in big data and its applications..- Quantitative Analysis of the Factors Influencing Macao's Tourism Industry: A Gray Correlation Approach..- Multi-AUV Hunting Strategy Based on the Lions Group Algorithm..- Study on the Coupled and Coordinated Relationships between Social Security and Economic and Social Development in Macao..- A Segmentation Network for Coastal Cegetation Guided by Category-Wighted Information from UAV Perspective Based on SAM..- GAN-Based Defogging and Multiscale Fusion Approach for UAV-Based Seagrass Bed Imagery Semantic segmentation in challenging marine environments..- SEBWatcher: Visual Analysis System for Subject, Environment and Behavior in Traffic Scenes..- Research on the Promotion and Enhancement Paths of Hainan Wenbifeng Pangu Cultural Tourist Areas Based on Network Text Analysis..- Based on Network Text Analysis: A Study on the Promotion Strategy of a Boundary Island for Lingshui Tourism Experience..- Deep Reinforcement Learning Based on Greed for the Critical Cross section Identification Problem..- Intrusion Detection Based on Feature Selection and Transformer BiGRU..- Discussion on the construction of power 3D design platform for nested virtualized hybrid cloud..- Applications of Data Science..- Application of Deep Learning Models Based on Chaos Modeling in Power Internet of Things Forecasting Tasks..- Study on compatibility of vegetable insulating oils and mineral insulating oils for transformer based on molecular simulation methods..- Collaborative optimization scheduling model for clean energy in microgrid clusters..- A Distributed Multi-Microgrid Intelligent Scheduling for New Power System..- Development and Application of Intelligent Calculation Analysis Platform For Large Power Grid..- A Deep Reinforcement Learning Control Strategy with Integrated Droop Control for Parallel DC-DC Buck Converters with CPLs..- MST: A Comprehensive Approach for Short-Term Power Load Forecasting Based on Data Decomposition, Local and Global Modeling..- An Approach to Identification Model for Electric Event Based on Clustering Analysis..- Research on Prediction Models and Optimization Methods for Electrical Current Consumption of Users..- Re-GNN: ANewModelforPredicting Circuit Reliability Degradation..- TRANSFORMER OIL TEMPERATURE PREDICTION METHOD BASED ON CAUSAL DISCOVERY AND GNN-LSTM MODEL..- The Electronic Power Data Traceability Model Based on Hybrid Consensus and IPFS..- Research on complementation of new energy spot and medium to long-term transactions based on blockchain.- Knowledge-driven Calculating Method for Transmission Section Limit of Large Power Grid.
£76.49
Springer Data Science
Book Synopsis.- Education research, methods and materials for data science and engine..- An empirical study of the factors influencing the improvement of educationquality within higher education institutions.- Study on the Intercultural Competence of Students in Hainan Vocational College..- Study on the Communicative Competence of Students of Tourism-related Majorsin Hainan Vocational Colleges..- Research on the Learning Adaptability and Learning Effectiveness of CollegeStudents under the Background of Digital Education..- Research on the Adaptability of Vocational College Majors and Industry EmpiricalStudy Based on 14 Vocational Colleges in Hainan,China..- Practice of the Campus Data Middle Platform Based on Lakehouse IntegratedArchitecture..- Data Security and Privacy..- Reversible Data Hiding for 3D Mesh Model Based on Block Modulus Encryptionand Multi-MSB Prediction..- QR code digital watermarking algorithm based on GWO..- Fast CKKS Algorithm in the SEAL Library..- A Transformer-based Video Colorization Method Fusing Local Self-Attention andBidirectional Optical Flow..- An NTRU Lattice-Based Chameleon Hash Scheme for Redactable Blockchain Applications..- Traceable Decentralized Policy-Based Chameleon Hash Scheme for BlockchainRewriting.- SECURE IDENTITY AUTHENTICATION PROTOCOL BASED ON BLOCKCHAININ SMART HOME..- False Data Injection Attack Detection Method Based on Long Time SeriesPrediction..- A Hybrid Iris Recognition System Model based on Presentation Attack Detectionand Traffic Monitoring Module on AIoT System..- Big Data Mining and Knowledge Management..- Leveraging Spatial Characteristics in Trajectory Compression: An Angle-basedBounded-error Method..- HENF: Hierarchical Entity Neighbor Multi-Relational Fusion Network forKnowledge Graph Completion..- TCB Intrusion Detection Method Based on Data Enhancement..- Multi-source Heterogeneous Data Joint Diagnosis Method for Transformers Basedon D-S Evidence Theory..- Progressive Federated Learning Scheme Based on Model Pruning..- Privacy Protection Data Aggregation Scheme Against Quantum Attacks..- LOCATION DATA QUADTREE PARTITIONINGALGORITHM BASED ONDIFFERENTIAL PRIVACY..- RLART: An Adaptive Radix Tree Based on Deep Reinforcement Learning.
£76.49
Springer Data Science
Book Synopsis.- Infrastructure for Data Science..- Android malware detection method based on machine learning..- A lightweight edge network intrusion detection system based on MobileVit..- TRAFFICNET: A NOVEL NETWORK PERFORMANCE PREDICTION MODEL VIAAGGREGATOR-BASED ENHANCEMENT..- Social Media and Recommendation System..- Sentiment analysis for public opinion based on MapReduce and PSO-SVR..- Personalized Novel Recommendation System Based on Filtering and Sentiment Analysis..- Enhancing Relevance and Efficiency in Visual Question Generation throughRedundant Object Filtering..- Chinese Named Entity Recognition Algorithm integrating Vocabulary Information..- WSDSum:Unsupervised Extractive Summarization Based..- IPFS-DKRM: an efficient keyword retrieval model of IPFS based on ART..- Multimedia Data Management and Analysis..- Multi-Modal Variable-Channel Spatial-Temporal Semantic Action RecognitionNetwork..- Enhanced and pruned motion planning based on bird's-eye view..- CCU-NET: CBAM and Cascaded Edge Detection Optimization U-NET for RemoteSensing Image Segmentation..- Speech Emotion Recognition Using U-Net..- Non-Invasive Load Decomposition Model Based On Inception-SimAM-BiLSTM..- A Data-driven Coordinated Active And Reactive Dispatching Strategy For Photovoltaics..- PDTNet: An Image-Based Model for PV Panel Defect Detection..- SAMCNet:A Multi-Channel Face Anti-Spoofing Network Combined with HyperspectralImages via Self-Attention Mechanism..- Image Tampering Detection Method Based on Hybrid Attention Mechanism..- ZhouStage-zero A Dynamic Ensemble method for Intrusion Detection in IndustrialControl System..- High-precision Anime Conversion Model based on Generative Adversarial Networks..- Anomaly Segmentation in Foggy Weather for Autonomous Driving with AdaptiveLearnable Filters..- Image tampering localization based on dual-stream feature fusion..- Multi-scale Image Tampering Detection Using Inception-UNet Network..- Fetal Congenital Heart Disease Diagnosis Based On CBAM-Enhanced ResNet-50..- Transformers for Single Object Tracking: Temporal Context Propagation and FrameRelationship Modeling..- AFETY HELMET WEARING DETECTION BASED ON YOLOv7.
£76.49
Elsevier Science & Technology Analyzing Social Media Networks with NodeXL
Book SynopsisTable of ContentsPart I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social Media: New Technologies of Collaboration 3. Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections Part II. NodeXL Tutorial: Learning by Doing 4. Installation, Orientation, and Layout 5. Labeling and Visual Attributes 6. Calculating and Visualizing Network Metrics 7. Grouping and Filtering 8. Semantic Networks Part III. Social Media Network Analysis Case Studies 9. Email: The Lifeblood of Modern Communication 10. Thread Networks: Mapping Message Boards and Email Lists 11. Twitter: Information Flows, Influencers, and Organic Communities 12. Facebook: Public Pages and Inter-Organizational Networks 13. YouTube: Exploring Video Networks 14. Wiki Networks: Connections of Culture and Collaboration
£37.00
Pearson Education Data Analytics with Spark Using Python
a huge range and FREE tracked UK delivery on ALL orders.
£31.34
Taylor & Francis Ltd DiskBased Algorithms for Big Data
Book SynopsisDisk-Based Algorithms for Big Data is a product of recent advances in the areas of big data, data analytics, and the underlying file systems and data management algorithms used to support the storage and analysis of massive data collections. The book discusses hard disks and their impact on data management, since Hard Disk Drives continue to be common in large data clusters. It also explores ways to store and retrieve data though primary and secondary indices. This includes a review of different in-memory sorting and searching algorithms that build a foundation for more sophisticated on-disk approaches like mergesort, B-trees, and extendible hashing. Following this introduction, the book transitions to more recent topics, including advanced storage technologies like solid-state drives and holographic storage; peer-to-peer (P2P) communication; large file systems and query languages like Hadoop/HDFS, Hive, Cassandra, and Presto; and NoSQL databases like Neo4j for graph structurTable of ContentsForeword. Physical Disk Storage. File Management. Sorting. Searching. Disk-Based Sorting. Disk-Based Searching. Storage Technology. Large File Systems. NoSQL Storage. Appendix
£56.99
Taylor & Francis Inc A Users Guide to Business Analytics
Book SynopsisA User''s Guide to Business Analytics provides a comprehensive discussion of statistical methods useful to the business analyst. Methods are developed from a fairly basic level to accommodate readers who have limited training in the theory of statistics. A substantial number of case studies and numerical illustrations using the R-software package are provided for the benefit of motivated beginners who want to get a head start in analytics as well as for experts on the job who will benefit by using this text as a reference book.The book is comprised of 12 chapters. The first chapter focuses on business analytics, along with its emergence and application, and sets up a context for the whole book. The next three chapters introduce R and provide a comprehensive discussion on descriptive analytics, including numerical data summarization and visual analytics. Chapters five through seven discuss set theory, definitions and counting rules, probability, random Table of ContentsWhat Is Analytics? Introducing R—An Analytics Software. Reporting Data. Statistical Graphics and Visual Analytics. Probability. Random Variables and Probability Distributions. Continuous Random Variables. Statistical Inference. Regression for Predictive Model Building. Decision Trees. Data Mining and Multivariate Methods. Modeling Time Series Data for Forecasting.
£128.25
John Murray Press If Then: How One Data Company Invented the Future
Book SynopsisRadio 4's Book of the WeekA Financial Times Book of the YearShortlisted for the 2020 Financial Times / McKinsey Business Book of the YearLonglisted for the National Book Award 'The story of the original data science hucksters of the 1960s is hilarious, scathing and sobering - what you might get if you crossed Mad Men with Theranos' David RuncimanThe Simulmatics Corporation, founded in 1959, mined data, targeted voters, accelerated news, manipulated consumers, destabilized politics, and disordered knowledge--decades before Facebook, Google, Amazon, and Cambridge Analytica. Silicon Valley likes to imagine it has no past but the scientists of Simulmatics are the long-dead grandfathers of Mark Zuckerberg and Elon Musk. Borrowing from psychological warfare, they used computers to predict and direct human behavior, deploying their "People Machine" from New York, Cambridge, and Saigon for clients that included John Kennedy's presidential campaign, the New York Times, Young & Rubicam, and, during the Vietnam War, the Department of Defence. In If Then, distinguished Harvard historian and New Yorker staff writer, Jill Lepore, unearths from the archives the almost unbelievable story of this long-vanished corporation, and of the women hidden behind it. In the 1950s and 1960s, Lepore argues, Simulmatics invented the future by building the machine in which the world now finds itself trapped and tormented, algorithm by algorithm.'A person can't help but feel inspired by the riveting intelligence and joyful curiosity of Jill Lepore. Knowing that there is a mind like hers in the world is a hope-inducing thing' George Saunders, Man Booker Prize-winning author of Lincoln in the Bardo'An authoritative account of the origins of data science, a compelling political narrative of America in the Sixties, a poignant collective biography of a generation of flawed men' David Kynaston'If Then is simultaneously gripping and absolutely terrifying' Amanda ForemanTrade ReviewLepore is a brilliant writer. It's a dream to read. -- Diane CoyleIf you're looking for beautiful writing and love history ... this is a lovely read that takes you through a history of American politics and campaigning, cold war intrigue and artificial intelligence. * Financial Times *Jill Lepore is the pre-eminent historian of forgotten tales from America's past that throw startling light on the present. This brilliant book illuminates the future too. The story of the original data science hucksters of the 1960s is hilarious, scathing and sobering - what you might get if you crossed Mad Men with Theranos. -- David RuncimanFascinating. * New York Times Book Review *A person can't help but feel inspired by the riveting intelligence and joyful curiosity of Jill Lepore. Knowing that there is a mind like hers in the world is a hope-inducing thing. -- George SaundersJill Lepore writes history like a poet. In If Then she yet again binds lyrical story telling to meticulous archival research to tell a gigantic story from our past. She builds our present, and makes it feel so familiar and yet so contingent. -- Dan SnowTwo things make this tale worth reading. One is Lepore's brisk and confident depiction of the individuals involved...the other is her exploration of the growing power of computers to accumulate and analyse data, bringing marketing and politics into ever closer union. -- Frances Cairncross * The Literary Review *Beautifully written and intellectually rigorous account of the origins of the science of predictive analytics and behavioral data science in the cold war era. * Financial Times *Fascinating. -- Amol Rajan * Start the Week *Everything Lepore writes is distinguished by intelligence, eloquence, and fresh insight. If Then is that, and even more: It's absolutely fascinating, excavating a piece of little-known American corporate history that reveals a huge amount about the way we live today and the companies that define the modern era. -- Susan OrleanA wonderfully written history of long-forgotten computer group Simulmatics. * Financial Times *
£18.00
Springer International Publishing AG Feedback Control Systems: The MATLAB®/Simulink® Approach
Book SynopsisFeedback control systems is an important course in aerospace engineering, chemical engineering, electrical engineering, mechanical engineering, and mechatronics engineering, to name just a few. Feedback control systems improve the system's behavior so the desired response can be acheived. The first course on control engineering deals with Continuous Time (CT) Linear Time Invariant (LTI) systems. Plenty of good textbooks on the subject are available on the market, so there is no need to add one more. This book does not focus on the control engineering theories as it is assumed that the reader is familiar with them, i.e., took/takes a course on control engineering, and now wants to learn the applications of MATLAB® in control engineering. The focus of this book is control engineering applications of MATLAB® for a first course on control engineering.Table of ContentsPreface.- Acknowledgments.- Introduction to MATLAB®.- Commonly Used Commands in Analysis of Control Systems.- Introduction to Simulink®.- Controller Design in MATLAB®.- Introduction to System Identification Toolbox™.- References.- Authors' Biographies.
£62.99
Springer International Publishing AG Phrase Mining from Massive Text and Its
Book SynopsisA lot of digital ink has been spilled on "big data" over the past few years. Most of this surge owes its origin to the various types of unstructured data in the wild, among which the proliferation of text-heavy data is particularly overwhelming, attributed to the daily use of web documents, business reviews, news, social posts, etc., by so many people worldwide.A core challenge presents itself: How can one efficiently and effectively turn massive, unstructured text into structured representation so as to further lay the foundation for many other downstream text mining applications? In this book, we investigated one promising paradigm for representing unstructured text, that is, through automatically identifying high-quality phrases from innumerable documents. In contrast to a list of frequent n-grams without proper filtering, users are often more interested in results based on variable-length phrases with certain semantics such as scientific concepts, organizations, slogans, and so on. We propose new principles and powerful methodologies to achieve this goal, from the scenario where a user can provide meaningful guidance to a fully automated setting through distant learning. This book also introduces applications enabled by the mined phrases and points out some promising research directions.Table of ContentsAcknowledgments.- Introduction.- Quality Phrase Mining with User Guidance.- Automated Quality Phrase Mining.- Phrase Mining Applications.- Bibliography.- Authors' Biographies .
£26.59
Springer Machine Learning for Networking
Book Synopsis.- Learning per-flow SD-WAN load-balancing policies..- Survey on Federated Learning in Smart Healthcare..- Complex Communication Networks Management with Distributed AI:Challenges and Open Issues..- A Framework for Global Trust and Reputation Management in 6G Networks..- DRL Framework for Minimizing Beam Switching Time and Maintaining QoS in 6G-V2X Base Stations..- Reducing BLE energy loss in busy 2.4GHz band..- Leveraging SHAP to advance the Robustness of Large Language Models..- Keyword-Driven Email Classification: Leveraging Machine Learning Techniques..- Predicting Intents: ARMA-Based Modeling..- Design and Evaluation of a Lightweight SDN Controller for Integrated Road and Rail Networks..- PiPS: An effective strategy and approach for Privacy in Public Surveillance..- A comprehensive review of deep learning approaches for tomato leaf diseases detection and classification in smart agriculture..- A review on advancement in PEM Fuel cell Diagnosis based on Machinelearning techniques..- GPS Spoofing Attack against UAVs: a timeseries dataset case study.
£134.99
Springer Big Data Analytics and Knowledge Discovery
Book Synopsis.- Keynote Talk..- Sparse Matrix Algorithms for Evolving Neural Networks..- Invited Talk..- Data integration in the AI era: research trends and still open issues..- Tutorial..- Leveraging machine learning techniques for customer data deduplication-hard-won lessons from a real-world project in the financial industry,.- Data mining and knowledge discovery..- FairFES - Fast Exact Sampling for Fair Classification..- Autism Detection by Analyzing Handwriting Characteristics of Chinese Characters via Deep Learning Models..- FNoDe: Faulty Node Detection in Microservices Architecture..- An Enhanced FP-Growth Algorithm with Hybrid Adaptive Support Threshold for Association Rule Mining..- Sequential data analytics and recommendation systems..- Entity Resolution for Streaming Data with Embeddings..- Cross-Modal Sequential Point-of-Interest Recommendation with Lightweight Hybrid Fusion Strategy..- Alternatives to Shallow Autoencoders for Collaborative Filtering..- Accurate Concept Drift Detection without Updating Autoencoders..- Graph data processing and analytics..- Parallel and Distributed SQL/PGQ Query Processing for Property Graphs..- Graph Constraint Language for Industrial Knowledge Graphs and Machine Learning..- SemViSG : Semantic Enrichment and Visualization of Software Graphs..- Data management and Indices..- Certainty Attacks Using Explainability Preprocessing..- Integrating Bitcoin Transactions into Relational Databases for IoT: Challenges and Solutions..- Effects of Response Length on User Search Experience in Spoken Conversational Search..- Fair Proportional Top-k Ranking..- PAID: Power-efficient AI-optimized Databases..- On the Costs and Benefits of Learned Indexing for Dynamic High-Dimensional Data..- A Bayesian Reinforcement Learning Framework for Online Index Tuning..- Large language models (LLMs)..- Explaining Recovery Trajectories of Older Adults Post Lower-Limb Fracture Using Modality-wise Multiview Clustering and Large Language Models..- Parameter Drift as a Signal for Membership Inference in Overfit-Tuned LLMs..- MicroSuggest: Kernel-Aware Microservice Decomposition..- TraceTune: Targeted Fine-Tuning of Attention Heads for Text-to-SQL..- Neural networks..- ONNYX : Optimized Neural Networks Yielding eXplainable insights from ECG signals-based data streams..- SpaPool: Soft Partition Assignment Pooling for Graph Neural Networks..- Prediction of iterative solvers' convergence using pretraining by natural images..- Local-aware Convolutional Modulation for Short-Term Sequential Recommendation.
£58.49
de Gruyter Oldenbourg Das Datenzentrische Unternehmen
Book Synopsis
£32.36
Springer International Publishing AG State of the Art Applications of Social Network Analysis
Book SynopsisSocial network analysis increasingly bridges the discovery of patterns in diverse areas of study as more data becomes available and complex. Yet the construction of huge networks from large data often requires entirely different approaches for analysis including; graph theory, statistics, machine learning and data mining. This work covers frontier studies on social network analysis and mining from different perspectives such as social network sites, financial data, e-mails, forums, academic research funds, XML technology, blog content, community detection and clique finding, prediction of user’s- behavior, privacy in social network analysis, mobility from spatio-temporal point of view, agent technology and political parties in parliament. These topics will be of interest to researchers and practitioners from different disciplines including, but not limited to, social sciences and engineering.Table of ContentsA Randomized Approach for Structural and Message based Private Friend Recommendation in Online Social Networks; B. K. Samanthula, W.Jiang.- Context Based Semantic Relations in Tweets; O. Ozdikis et al.- Fast exact and approximate computation of betweenness centrality in social networks; M. Baglioni et al.- Network Simulation; E. Franchi.- Early Stage Conversation Catalysts on Entertainment-Based Web Forums; J. Lanagan et al.- Predicting Users Behaviours in Distributed Social Networks Using Community Analysis ; B. Ngonmang et al.- What should we protect? Defining differential privacy for social network analysis; C. Task, C.Clifton.- Complex Network Analysis of Research Funding: A Case Study of NSF Grants; H. Kardes et al.- Community Evolutionary Events in Online Social Networks; M. Abulaish, S. Yousuf Bhat.-@Rank: Personalized Centrality Measure for Email Communication Networks; P. Lubarski, M. Morzy.-Twitter Sentiment Analysis: How To Hedge Your Bets In The Stock Markets; T.Rao, S. Srivastava.- The Impact of Measurement Time on Subgroup Detection in Online Communities; S. Zeini et al.- Spatial and Temporal Evaluation of Network-Based Analysis of Human Mobility; M. Coscia et al.- An Ant based Particle Swarm Optimization Algorithm for Maximum Clique Problem in Social networks; M. Soleimani-pouri et al.- XEngine: An XML Search Engine for Social Groups; K.Taha.- Size, diversity and components in the network around an entrepreneur: Shaped by culture and shaping embeddedness of firm relations; M. Cheraghi, T.Schott .- Content Mining of Microblogs; M.Ö. Cingiz, B. Diri.
£42.74
Springer International Publishing AG Advances in Big Data: Proceedings of the 2nd INNS
Book SynopsisThe book offers a timely snapshot of neural network technologies as a significant component of big data analytics platforms. It promotes new advances and research directions in efficient and innovative algorithmic approaches to analyzing big data (e.g. deep networks, nature-inspired and brain-inspired algorithms); implementations on different computing platforms (e.g. neuromorphic, graphics processing units (GPUs), clouds, clusters); and big data analytics applications to solve real-world problems (e.g. weather prediction, transportation, energy management). The book, which reports on the second edition of the INNS Conference on Big Data, held on October 23–25, 2016, in Thessaloniki, Greece, depicts an interesting collaborative adventure of neural networks with big data and other learning technologies.Table of ContentsPredicting human behavior based on web search activity: Greek referendum of 2015.- Compact Video Description and Representation for Automated Summarization of Human Activities.- Attribute Learning for Network Intrusion Detection.- A Fast Deep Convolutional Neural Network for face detection in Big Visual Data.- Learning Symbols by Neural Network.- Designing HMMs models in the age of Big Data.- Extended Formulations for Online Action Selection on Big Action Sets.- Multi-Task Deep Neural Networks for Automated Extraction of Primary Site and Laterality Information from Cancer Pathology Reports.- An infrastructure and approach for infering knowledge over Big Data in the Vehicle Insurance Industry.- Unified Retrieval Model of Big Data.- Adaptive Elitist Differential Evolution Extreme Learning Machines on Big Data: Intelligent Recognition of Invasive Species.
£116.99
Springer International Publishing AG Network Intelligence Meets User Centered Social
Book SynopsisThis edited volume presents advances in modeling and computational analysis techniques related to networks and online communities. It contains the best papers of notable scientists from the 4th European Network Intelligence Conference (ENIC 2017) that have been peer reviewed and expanded into the present format. The aim of this text is to share knowledge and experience as well as to present recent advances in the field. The book is a nice mix of basic research topics such as data-based centrality measures along with intriguing applied topics, for example, interaction decay patterns in online social communities. This book will appeal to students, professors, and researchers working in the fields of data science, computational social science, and social network analysis. Table of ContentsData-based centrality measures.- Extracting the Main Path of historic events from Wikipedia.- Simulating trade in economic networks with TrEcSim.- Community Aliveness: Discovering interaction decay patterns in online social communities.- Network Patterns of Direct and Indirect Reciprocity in edX MOOC Forums.- Targeting influential nodes for recovery in bootstrap percolation on hyperbolic networks.- Trump versus Clinton – Twitter communication during the US primaries.- Extended feature-driven graph model for Social Media Networks.- Market basket analysis using minimum spanning trees.- Behavior-based relevance estimation for social networks interaction relations.- Sponge walker: Community detection in large directed social networks using local structures and random walks.- Identifying promising research topics in Computer Science.- Identifying accelerators of information diffusion across social media channels .- Towards an ILP approach for learning privacy heuristics from users' regrets.- Strength of nations: A case study on estimating the influence of leading countries using social media analysis.- Incremental learning in dynamic networks for node classification.
£33.74
Springer-Verlag Berlin and Heidelberg GmbH & Co. KG Advances in Data Mining: Applications and
Book SynopsisThis book constitutes the refereed proceedings of the 13th Industrial Conference on Data Mining, ICDM 2013, held in New York, NY, in July 2013. The 22 revised full papers presented were carefully reviewed and selected from 112 submissions. The topics range from theoretical aspects of data mining to applications of data mining, such as in multimedia data, in marketing, finance and telecommunication, in medicine and agriculture, and in process control, industry and society.Table of ContentsTheoretical aspects of data mining; applications of data mining in multimedia data.- Applications of data mining in marketing and in finance.- Applications of data mining in telecommunication.- Applications of data mining in medicine and agriculture.- Applications of data mining in process control, industry and society.
£39.99
Springer-Verlag New York Inc. Machine Learning in Cyber Trust
Book SynopsisCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.Trade ReviewFrom the reviews: "This is a useful book on machine learning for cyber security applications. It will be helpful to researchers and graduate students who are looking for an introduction to a specific topic in the field. All of the topics covered are well researched. The book consists of 12 chapters, grouped into four parts." (Imad H. Elhajj, ACM Computing Reviews, October, 2009)Table of ContentsCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.
£125.99
John Wiley & Sons Inc Unsupervised Learning
Book SynopsisA new approach to unsupervised learning Evolving technologies have brought about an explosion of information in recent years, but the question of how such information might be effectively harvested, archived, and analyzed remains a monumental challengefor the processing of such information is often fraught with the need for conceptual interpretation: a relatively simple task for humans, yet an arduous one for computers. Inspired by the relative success of existing popular research on self-organizing neural networks for data clustering and feature extraction, Unsupervised Learning: A Dynamic Approach presents information within the family of generative, self-organizing maps, such as the self-organizing tree map (SOTM) and the more advanced self-organizing hierarchical variance map (SOHVM). It covers a series of pertinent, real-world applications with regard to the processing of multimedia datafrom its role in generic image processing techniques, such as thTable of ContentsAcknowledgments xi 1 Introduction 1 1.1 Part I: The Self-Organizing Method 1 1.2 Part II: Dynamic Self-Organization for Image Filtering and Multimedia Retrieval 2 1.3 Part III: Dynamic Self-Organization for Image Segmentation and Visualization 5 1.4 Future Directions 7 2 Unsupervised Learning 9 2.1 Introduction 9 2.2 Unsupervised Clustering 9 2.3 Distance Metrics for Unsupervised Clustering 11 2.4 Unsupervised Learning Approaches 13 2.4.1 Partitioning and Cluster Membership 13 2.4.2 Iterative Mean-Squared Error Approaches 15 2.4.3 Mixture Decomposition Approaches 17 2.4.4 Agglomerative Hierarchical Approaches 18 2.4.5 Graph-Theoretic Approaches 20 2.4.6 Evolutionary Approaches 20 2.4.7 Neural Network Approaches 21 2.5 Assessing Cluster Quality and Validity 21 2.5.1 Cost Function–Based Cluster Validity Indices 22 2.5.2 Density-Based Cluster Validity Indices 23 2.5.3 Geometric-Based Cluster Validity Indices 24 3 Self-Organization 27 3.1 Introduction 27 3.2 Principles of Self-Organization 27 3.2.1 Synaptic Self-Amplification and Competition 27 3.2.2 Cooperation 28 3.2.3 Knowledge Through Redundancy 29 3.3 Fundamental Architectures 29 3.3.1 Adaptive Resonance Theory 29 3.3.2 Self-Organizing Map 37 3.4 Other Fixed Architectures for Self-Organization 43 3.4.1 Neural Gas 44 3.4.2 Hierarchical Feature Map 45 3.5 Emerging Architectures for Self-Organization 46 3.5.1 Dynamic Hierarchical Architectures 47 3.5.2 Nonstationary Architectures 48 3.5.3 Hybrid Architectures 50 3.6 Conclusion 50 4 Self-Organizing Tree Map 53 4.1 Introduction 53 4.2 Architecture 54 4.3 Competitive Learning 55 4.4 Algorithm 57 4.5 Evolution 61 4.5.1 Dynamic Topology 61 4.5.2 Classification Capability 64 4.6 Practical Considerations, Extensions, and Refinements 68 4.6.1 The Hierarchical Control Function 68 4.6.2 Learning, Timing, and Convergence 71 4.6.3 Feature Normalization 73 4.6.4 Stop Criteria 73 4.7 Conclusions 74 5 Self-Organization in Impulse Noise Removal 75 5.1 Introduction 75 5.2 Review of Traditional Median-Type Filters 76 5.3 The Noise-Exclusive Adaptive Filtering 82 5.3.1 Feature Selection and Impulse Detection 82 5.3.2 Noise Removal Filters 84 5.4 Experimental Results 86 5.5 Detection-Guided Restoration and Real-Time Processing 99 5.5.1 Introduction 99 5.5.2 Iterative Filtering 101 5.5.3 Recursive Filtering 104 5.5.4 Real-Time Processing of Impulse Corrupted TV Pictures 105 5.5.5 Analysis of the Processing Time 109 5.6 Conclusions 115 6 Self-Organization in Image Retrieval 119 6.1 Retrieval of Visual Information 120 6.2 Visual Feature Descriptor 122 6.2.1 Color Histogram and Color Moment Descriptors 122 6.2.2 Wavelet Moment and Gabor Texture Descriptors 123 6.2.3 Fourier and Moment-based Shape Descriptors 125 6.2.4 Feature Normalization and Selection 127 6.3 User-Assisted Retrieval 130 6.3.1 Radial Basis Function Method 132 6.4 Self-Organization for Pseudo Relevance Feedback 136 6.5 Directed Self-Organization 140 6.5.1 Algorithm 142 6.6 Optimizing Self-Organization for Retrieval 146 6.6.1 Genetic Principles 147 6.6.2 System Architecture 149 6.6.3 Genetic Algorithm for Feature Weight Detection 150 6.7 Retrieval Performance 153 6.7.1 Directed Self-Organization 153 6.7.2 Genetic Algorithm Weight Detection 155 6.8 Summary 157 7 The Self-Organizing Hierarchical Variance Map 159 7.1 An Intuitive Basis 160 7.2 Model Formulation and Breakdown 162 7.2.1 Topology Extraction via Competitive Hebbian Learning 163 7.2.2 Local Variance via Hebbian Maximal Eigenfilters 165 7.2.3 Global and Local Variance Interplay for Map Growth and Termination 170 7.3 Algorithm 173 7.3.1 Initialization, Continuation, and Presentation 173 7.3.2 Updating Network Parameters 175 7.3.3 Vigilance Evaluation and Map Growth 175 7.3.4 Topology Adaptation 176 7.3.5 Node Adaptation 177 7.3.6 Optional Tuning Stage 177 7.4 Simulations and Evaluation 177 7.4.1 Observations of Evolution and Partitioning 178 7.4.2 Visual Comparisons with Popular Mean-Squared Error Architectures 181 7.4.3 Visual Comparison Against Growing Neural Gas 183 7.4.4 Comparing Hierarchical with Tree-Based Methods 183 7.5 Tests on Self-Determination and the Optional Tuning Stage 187 7.6 Cluster Validity Analysis on Synthetic and UCI Data 187 7.6.1 Performance vs. Popular Clustering Methods 190 7.6.2 IRIS Dataset 192 7.6.3 WINE Dataset 195 7.7 Summary 195 8 Microbiological Image Analysis Using Self-Organization 197 8.1 Image Analysis in the Biosciences 197 8.1.1 Segmentation: The Common Denominator 198 8.1.2 Semi-supervised versus Unsupervised Analysis 199 8.1.3 Confocal Microscopy and Its Modalities 200 8.2 Image Analysis Tasks Considered 202 8.2.1 Visualising Chromosomes During Mitosis 202 8.2.2 Segmenting Heterogeneous Biofilms 204 8.3 Microbiological Image Segmentation 205 8.3.1 Effects of Feature Space Definition 207 8.3.2 Fixed Weighting of Feature Space 209 8.3.3 Dynamic Feature Fusion During Learning 213 8.4 Image Segmentation Using Hierarchical Self-Organization 215 8.4.1 Gray-Level Segmentation of Chromosomes 215 8.4.2 Automated Multilevel Thresholding of Biofilm 220 8.4.3 Multidimensional Feature Segmentation 221 8.5 Harvesting Topologies to Facilitate Visualization 226 8.5.1 Topology Aware Opacity and Gray-Level Assignment 227 8.5.2 Visualization of Chromosomes During Mitosis 228 8.6 Summary 233 9 Closing Remarks and Future Directions 237 9.1 Summary of Main Findings 237 9.1.1 Dynamic Self-Organization: Effective Models for Efficient Feature Space Parsing 237 9.1.2 Improved Stability, Integrity, and Efficiency 238 9.1.3 Adaptive Topologies Promote Consistency and Uncover Relationships 239 9.1.4 Online Selection of Class Number 239 9.1.5 Topologies Represent a Useful Backbone for Visualization or Analysis 240 9.2 Future Directions 240 9.2.1 Dynamic Navigation for Information Repositories 241 9.2.2 Interactive Knowledge-Assisted Visualization 243 9.2.3 Temporal Data Analysis Using Trajectories 245 Appendix A 249 A.1 Global and Local Consistency Error 249 References 251 Index 269
£100.76
John Wiley & Sons Inc Making Sense of Data III
Book SynopsisAs third in the series, this book focuses on a style of data analysis that makes graphics central to exploration. Making Sense of Data III explains how to implement decision support systems and provides an interactive approach to data analysis that allows users to see, manipulate, explore, mine data, and share results with colleagues.Trade Review“It is an essential book for understanding the principal role that graphics play in data visualization.” (Zentralblatt MATH, 1 April 2015) Table of ContentsPreface. 1. Introduction. 1.1 Overview. 1.2 Visual Perception. 1.3 Visualization. 1.4 Designing for High-throughput Data Exploration. 1.5 Summary. 1.6 Further reading. 2. The Cognitive and Visual Systems. 2.1 External Representation. 2.2 The Cognitive System. 2.3 Visual Perception. 2.4 Influencing Visual Perception. 2.5 Summary. 2.6 Further reading. 3. Graphic Representations. 3.1 Jacques Bertin: Semiology of Graphics. 3.2 Wilkinson: Grammar of Graphics. 3.3 Wickham: ggplot2. 3.4 Bostock and Heer: Protovis. 3.5 Summary. 3.6 Further reading. 4. Designing Visual Interactions. 4.1 Designing for Complexity. 4.2 The Process of Design. 4.3 Visual Interaction Design. 5. Hands-on: Creating Interactive Visualizations with Protovis. 5.1 Using Protovis. 5.2 Creating Code using the Protovis Graphical Framework. 5.3 Basic Protovis Marks. 5.4 Creating Customized Plots. 5.5 Creating Basic Plots. 5.6 Data Analysis Graphs. 5.7 Composite Plots. 5.8 Interactive Plots. 5.9 Protovis Summary. 5.10 Further Reading. Appendix. A Exercise Code Examples. Bibliography. Index.
£81.86
John Wiley & Sons Inc Data Mining Techniques
Book SynopsisThe leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business.Table of ContentsIntroduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 What Is Data Mining? 2 Data Mining Is a Business Process 2 Large Amounts of Data 3 Meaningful Patterns and Rules 3 Data Mining and Customer Relationship Management 4 Why Now? 6 Data Is Being Produced 6 Data Is Being Warehoused 6 Computing Power Is Affordable 7 Interest in Customer Relationship Management Is Strong 7 Commercial Data Mining Software Products Have Become Available 8 Skills for the Data Miner 9 The Virtuous Cycle of Data Mining 9 A Case Study in Business Data Mining 11 Identifying BofA’s Business Challenge 12 Applying Data Mining 12 Acting on the Results 13 Measuring the Effects of Data Mining 14 Steps of the Virtuous Cycle 15 Identify Business Opportunities 16 Transform Data into Information 17 Act on the Information 19 Measure the Results 20 Data Mining in the Context of the Virtuous Cycle 23 Lessons Learned 26 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Two Customer Lifecycles 27 The Customer’s Lifecycle 28 The Customer Lifecycle 28 Subscription Relationships versus Event-Based Relationships 30 Organize Business Processes Around the Customer Lifecycle 32 Customer Acquisition 33 Customer Activation 36 Customer Relationship Management 37 Winback 38 Data Mining Applications for Customer Acquisition 38 Identifying Good Prospects 39 Choosing a Communication Channel 39 Picking Appropriate Messages 40 A Data Mining Example: Choosing the Right Place to Advertise 40 Who Fits the Profile? 41 Measuring Fitness for Groups of Readers 44 Data Mining to Improve Direct Marketing Campaigns 45 Response Modeling 46 Optimizing Response for a Fixed Budget 47 Optimizing Campaign Profitability 49 Reaching the People Most Influenced by the Message 53 Using Current Customers to Learn About Prospects 54 Start Tracking Customers Before They Become “Customers” 55 Gather Information from New Customers 55 Acquisition-Time Variables Can Predict Future Outcomes 56 Data Mining Applications for Customer Relationship Management 56 Matching Campaigns to Customers 56 Reducing Exposure to Credit Risk 58 Determining Customer Value 59 Cross-selling, Up-selling, and Making Recommendations 60 Retention 60 Recognizing Attrition 60 Why Attrition Matters 61 Different Kinds of Attrition 62 Different Kinds of Attrition Model 63 Beyond the Customer Lifecycle 64 Lessons Learned 65 Chapter 3 The Data Mining Process 67 What Can Go Wrong? 68 Learning Things That Aren’t True 68 Learning Things That Are True, but Not Useful 73 Data Mining Styles 74 Hypothesis Testing 75 Directed Data Mining 81 Undirected Data Mining 81 Goals, Tasks, and Techniques 82 Data Mining Business Goals 82 Data Mining Tasks 83 Data Mining Techniques 88 Formulating Data Mining Problems: From Goals to Tasks to Techniques 88 What Techniques for Which Tasks? 95 Is There a Target or Targets? 96 What Is the Target Data Like? 96 What Is the Input Data Like? 96 How Important Is Ease of Use? 97 How Important Is Model Explicability? 97 Lessons Learned 98 Chapter 4 Statistics 101: What You Should Know About Data 101 Occam’s Razor 103 Skepticism and Simpson’s Paradox 103 The Null Hypothesis 104 P-Values 105 Looking At and Measuring Data 106 Categorical Values 106 Numeric Variables 117 A Couple More Statistical Ideas 120 Measuring Response 120 Standard Error of a Proportion 121 Comparing Results Using Confidence Bounds 123 Comparing Results Using Difference of Proportions 124 Size of Sample 125 What the Confidence Interval Really Means 126 Size of Test and Control for an Experiment 127 Multiple Comparisons 129 The Confidence Level with Multiple Comparisons 129 Bonferroni’s Correction 129 Chi-Square Test 130 Expected Values 130 Chi-Square Value 132 Comparison of Chi-Square to Difference of Proportions 134 An Example: Chi-Square for Regions and Starts 134 Case Study: Comparing Two Recommendation Systems with an A/B Test 138 First Metric: Participating Sessions 140 Data Mining and Statistics 144 Lessons Learned 148 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Directed Data Mining Models 152 Defining the Model Structure and Target 152 Incremental Response Modeling 154 Model Stability 156 Time-Frames in the Model Set 157 Directed Data Mining Methodology 159 Step 1: Translate the Business Problem into a Data Mining Problem 161 How Will Results Be Used? 163 How Will Results Be Delivered? 163 The Role of Domain Experts and Information Technology 164 Step 2: Select Appropriate Data 165 What Data Is Available? 166 How Much Data Is Enough? 167 How Much History Is Required? 167 How Many Variables? 168 What Must the Data Contain? 168 Step 3: Get to Know the Data 169 Examine Distributions 169 Compare Values with Descriptions 170 Validate Assumptions 170 Ask Lots of Questions 171 Step 4: Create a Model Set 172 Assembling Customer Signatures 172 Creating a Balanced Sample 172 Including Multiple Timeframes 174 Creating a Model Set for Prediction 174 Creating a Model Set for Profiling 176 Partitioning the Model Set 176 Step 5: Fix Problems with the Data 177 Categorical Variables with Too Many Values 177 Numeric Variables with Skewed Distributions and Outliers 178 Missing Values 178 Values with Meanings That Change over Time 179 Inconsistent Data Encoding 179 Step 6: Transform Data to Bring Information to the Surface 180 Step 7: Build Models 180 Step 8: Assess Models 180 Assessing Binary Response Models and Classifiers 181 Assessing Binary Response Models Using Lift 182 Assessing Binary Response Model Scores Using Lift Charts 184 Assessing Binary Response Model Scores Using Profitability Models 185 Assessing Binary Response Models Using ROC Charts 186 Assessing Estimators 188 Assessing Estimators Using Score Rankings 189 Step 9: Deploy Models 190 Practical Issues in Deploying Models 190 Optimizing Models for Deployment 191 Step 10: Assess Results 191 Step 11: Begin Again 193 Lessons Learned 193 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Similarity Models 196 Similarity and Distance 196 Example: A Similarity Model for Product Penetration 197 Table Lookup Models 203 Choosing Dimensions 204 Partitioning the Dimensions 205 From Training Data to Scores 205 Handling Sparse and Missing Data by Removing Dimensions 205 RFM: A Widely Used Lookup Model 206 RFM Cell Migration 207 RFM and the Test-and-Measure Methodology 208 RFM and Incremental Response Modeling 209 Naïve Bayesian Models 210 Some Ideas from Probability 210 The Naïve Bayesian Calculation 212 Comparison with Table Lookup Models 213 Linear Regression 213 The Best-fit Line 215 Goodness of Fit 217 Multiple Regression 220 The Equation 220 The Range of the Target Variable 221 Interpreting Coefficients of Linear Regression Equations 221 Capturing Local Effects with Linear Regression 223 Additional Considerations with Multiple Regression 224 Variable Selection for Multiple Regression 225 Logistic Regression 227 Modeling Binary Outcomes 227 The Logistic Function 229 Fixed Effects and Hierarchical Effects 231 Hierarchical Effects 232 Within and Between Effects 232 Fixed Effects 233 Lessons Learned 234 Chapter 7 Decision Trees 237 What Is a Decision Tree and How Is It Used? 238 A Typical Decision Tree 238 Using the Tree to Learn About Churn 240 Using the Tree to Learn About Data and Select Variables 241 Using the Tree to Produce Rankings 243 Using the Tree to Estimate Class Probabilities 243 Using the Tree to Classify Records 244 Using the Tree to Estimate Numeric Values 244 Decision Trees Are Local Models 245 Growing Decision Trees 247 Finding the Initial Split 248 Growing the Full Tree 251 Finding the Best Split 252 Gini (Population Diversity) as a Splitting Criterion 253 Entropy Reduction or Information Gain as a Splitting Criterion 254 Information Gain Ratio 256 Chi-Square Test as a Splitting Criterion 256 Incremental Response as a Splitting Criterion 258 Reduction in Variance as a Splitting Criterion for Numeric Targets 259 F Test 262 Pruning 262 The CART Pruning Algorithm 263 Pessimistic Pruning: The C5.0 Pruning Algorithm 267 Stability-Based Pruning 268 Extracting Rules from Trees 269 Decision Tree Variations 270 Multiway Splits 270 Splitting on More Than One Field at a Time 271 Creating Nonrectangular Boxes 271 Assessing the Quality of a Decision Tree 275 When Are Decision Trees Appropriate? 276 Case Study: Process Control in a Coffee Roasting Plant 277 Goals for the Simulator 277 Building a Roaster Simulation 278 Evaluation of the Roaster Simulation 278 Lessons Learned 279 Chapter 8 Artificial Neural Networks 281 A Bit of History 282 The Biological Model 283 The Biological Neuron 285 The Biological Input Layer 286 The Biological Output Layer 287 Neural Networks and Artificial Intelligence 287 Artificial Neural Networks 288 The Artificial Neuron 288 The Multi-Layer Perceptron 291 A Network Example 292 Network Topologies 293 A Sample Application: Real Estate Appraisal 295 Training Neural Networks 299 How Does a Neural Network Learn Using Back Propagation? 299 Pruning a Neural Network 300 Radial Basis Function Networks 303 Overview of RBF Networks 303 Choosing the Locations of the Radial Basis Functions 305 Universal Approximators 305 Neural Networks in Practice 308 Choosing the Training Set 309 Coverage of Values for All Features 309 Number of Features 310 Size of Training Set 310 Number and Range of Outputs 310 Rules of Thumb for Using MLPs 310 Preparing the Data 311 Interpreting the Output from a Neural Network 313 Neural Networks for Time Series 315 Time Series Modeling 315 A Neural Network Time Series Example 316 Can Neural Network Models Be Explained? 317 Sensitivity Analysis 318 Using Rules to Describe the Scores 318 Lessons Learned 319 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Memory-Based Reasoning 322 Look-Alike Models 323 Example: Using MBR to Estimate Rents in Tuxedo, New York 324 Challenges of MBR 327 Choosing a Balanced Set of Historical Records 328 Representing the Training Data 328 Determining the Distance Function, Combination Function, and Number of Neighbors 331 Case Study: Using MBR for Classifying Anomalies in Mammograms 331 The Business Problem: Identifying Abnormal Mammograms 332 Applying MBR to the Problem 332 The Total Solution 334 Measuring Distance and Similarity 335 What Is a Distance Function? 335 Building a Distance Function One Field at a Time 337 Distance Functions for Other Data Types 340 When a Distance Metric Already Exists 341 The Combination Function: Asking the Neighbors for Advice 342 The Simplest Approach: One Neighbor 342 The Basic Approach for Categorical Targets: Democracy 342 Weighted Voting for Categorical Targets 344 Numeric Targets 344 Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345 Why This Feat Is Challenging 346 The Audio Signature 347 Measuring Similarity 348 Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351 Building Profiles 352 Comparing Profiles 352 Making Predictions 353 Lessons Learned 354 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Customer Survival 360 What Survival Curves Reveal 360 Finding the Average Tenure from a Survival Curve 362 Customer Retention Using Survival 364 Looking at Survival as Decay 365 Hazard Probabilities 367 The Basic Idea 368 Examples of Hazard Functions 369 Censoring 371 The Hazard Calculation 372 Other Types of Censoring 375 From Hazards to Survival 376 Retention 376 Survival 378 Comparison of Retention and Survival 378 Proportional Hazards 380 Examples of Proportional Hazards 381 Stratification: Measuring Initial Effects on Survival 382 Cox Proportional Hazards 382 Survival Analysis in Practice 385 Handling Different Types of Attrition 385 When Will a Customer Come Back? 387 Understanding Customer Value 389 Forecasting 392 Hazards Changing over Time 393 Lessons Learned 394 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Optimization 398 What Is an Optimization Problem? 398 An Optimization Problem in Ant World 399 E Pluribus Unum 400 A Smarter Ant 401 Genetic Algorithms 403 A Bit of History 404 Genetics on Computers 404 Representing the Genome 413 Schemata: The Building Blocks of Genetic Algorithms 414 Beyond the Simple Algorithm 417 The Traveling Salesman Problem 418 Exhaustive Search 419 A Simple Greedy Algorithm 419 The Genetic Algorithms Approach 419 The Swarm Intelligence Approach 420 Case Study: Using Genetic Algorithms for Resource Optimization 421 Case Study: Evolving a Solution for Classifying Complaints 423 Business Context 424 Data 425 The Comment Signature 425 The Genomes 426 The Fitness Function 427 The Results 427 Lessons Learned 427 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Undirected Techniques, Undirected Data Mining 431 Undirected versus Directed Techniques 431 Undirected versus Directed Data Mining 431 Case Study: Undirected Data Mining Using Directed Techniques 432 What is Undirected Data Mining? 435 Data Exploration 435 Segmentation and Clustering 436 Target Variable Definition, When the Target Is Not Explicit 438 Simulation, Forecasting, and Agent-Based Modeling 443 Methodology for Undirected Data Mining 455 There Is No Methodology 456 Things to Keep in Mind 456 Lessons Learned 457 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Searching for Islands of Simplicity 461 Customer Segmentation and Clustering 461 Similarity Clusters 463 Tracking Campaigns by Cluster-Based Segments 464 Clustering Reveals an Overlooked Market Segment 466 Fitting the Troops 467 The K-Means Clustering Algorithm 468 Two Steps of the K-Means Algorithm 468 Voronoi Diagrams and K-Means Clusters 471 Choosing the Cluster Seeds 473Choosing K 473 Using K-Means to Detect Outliers 474 Semi-Directed Clustering 475 Interpreting Clusters 475 Characterizing Clusters by Their Centroids 476 Characterizing Clusters by What Differentiates Them 477 Using Decision Trees to Describe Clusters 478 Evaluating Clusters 479 Cluster Measurements and Terminology 480 Cluster Silhouettes 480 Limiting Cluster Diameter for Scoring 483 Case Study: Clustering Towns 484 Creating Town Signatures 484 Creating Clusters 486 Determining the Right Number of Clusters 486 Evaluating the Clusters 487 Using Demographic Clusters to Adjust Zone Boundaries 488 Business Success 490 Variations on K-Means 490 K-Medians, K-Medoids, and K-Modes 490 The Soft Side of K-Means 494 Data Preparation for Clustering 495 Scaling for Consistency 496 Use Weights to Encode Outside Information 496 Selecting Variables for Clustering 497 Lessons Learned 497 Chapter 14 Alternative Approaches to Cluster Detection 499 Shortcomings of K-Means 500 Reasonableness 500 An Intuitive Example 501 Fixing the Problem by Changing the Scales 503 What This Means in Practice 504 Gaussian Mixture Models 505 Adding “Gaussians” to K-Means 505 Back to Gaussian Mixture Models 508 Scoring GMMs 510 Applying GMMs 511 Divisive Clustering 513 A Decision Tree–Like Method for Clustering 513 Scoring Divisive Clusters 515 Clusters and Trees 515 Agglomerative (Hierarchical) Clustering 516 Overview of Agglomerative Clustering Methods 516 Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520 Scoring Agglomerative Clusters 522 Limitations of Agglomerative Clustering 523 Agglomerative Clustering in Practice 525 Combining Agglomerative Clustering and K-Means 526 Self-Organizing Maps 527 What Is a Self-Organizing Map? 527 Training an SOM 530 Scoring an SOM 531 The Search Continues for Islands of Simplicity 532 Lessons Learned 533 Chapter 15 Market Basket Analysis and Association Rules 535 Defining Market Basket Analysis 536 Four Levels of Market Basket Data 537 The Foundation of Market Basket Analysis: Basic Measures 539 Order Characteristics 540 Item (Product) Popularity 541 Tracking Marketing Interventions 542 Case Study: Spanish or English 543 The Business Problem 543 The Data 544 Defining “Hispanicity” Preference 545 The Solution 546 Association Analysis 547 Rules Are Not Always Useful 548 Item Sets to Association Rules 551 How Good Is an Association Rule? 553 Building Association Rules 555 Choosing the Right Set of Items 556 Anonymous Versus Identified 561 Generating Rules from All This Data 561 Overcoming Practical Limits 565 The Problem of Big Data 567 Extending the Ideas 569 Different Items on the Right- and Left-Hand Sides 569 Using Association Rules to Compare Stores 570 Association Rules and Cross-Selling 572 A Typical Cross-Sell Model 572 A More Confident Approach to Product Propensities 573 Results from Using Confidence 574 Sequential Pattern Analysis 574 Finding the Sequences 575 Sequential Association Rules 578 Sequential Analysis Using Other Data Mining Techniques 579 Lessons Learned 579 Chapter 16 Link Analysis 581 Basic Graph Theory 582 What Is a Graph? 582 Directed Graphs 584 Weighted Graphs 585 Seven Bridges of Königsberg 585 Detecting Cycles in a Graph 588 The Traveling Salesman Problem Revisited 589 Social Network Analysis 593 Six Degrees of Separation 593 What Your Friends Say About You 595 Finding Childcare Benefits Fraud 596 Who Responds to Whom on Dating Sites 597 Social Marketing 598 Mining Call Graphs 598 Case Study: Tracking Down the Leader of the Pack 601 The Business Goal 601 The Data Processing Challenge 601 Finding Social Networks in Call Data 602 How the Results Are Used for Marketing 602 Estimating Customer Age 603 Case Study: Who Is Using Fax Machines from Home? 604 Why Finding Fax Machines Is Useful 604 How Do Fax Machines Behave? 604 A Graph Coloring Algorithm 605 “Coloring” the Graph to Identify Fax Machines 606 How Google Came to Rule the World 607 Hubs and Authorities 608 The Details 609 Hubs and Authorities in Practice 611 Lessons Learned 612 Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613 The Architecture of Data 615 Transaction Data, the Base Level 616 Operational Summary Data 617 Decision-Support Summary Data 617 Database Schema/Data Models 618 Metadata 623 Business Rules 623 A General Architecture for Data Warehousing 624 Source Systems 624 Extraction, Transformation, and Load 626 Central Repository 627 Metadata Repository 630 Data Marts 630 Operational Feedback 631 Users and Desktop Tools 631 Analytic Sandboxes 633 Why Are Analytic Sandboxes Needed? 634 Technology to Support Analytic Sandboxes 636 Where Does OLAP Fit In? 639 What’s in a Cube? 641 Star Schema 646 OLAP and Data Mining 648 Where Data Mining Fits in with Data Warehousing 650 Lots of Data 651 Consistent, Clean Data 651 Hypothesis Testing and Measurement 652 Scalable Hardware and RDBMS Support 653 Lessons Learned 653 Chapter 18 Building Customer Signatures 655 Finding Customers in Data 656 What Is a Customer? 657 Accounts? Customers? Households? 658 Anonymous Transactions 658 Transactions Linked to a Card 659 Transactions Linked to a Cookie 659 Transactions Linked to an Account 660 Transactions Linked to a Customer 661 Designing Signatures 661 Is a Customer Signature Necessary? 666 What Does a Row Represent? 666 Will the Signature Be Used for Predictive Modeling? 671 Has a Target Been Defined? 672 Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672 Which Customers Will Be Included? 673 What Might Be Interesting to Know About Customers? 673 What a Signature Looks Like 674 Process for Creating Signatures 677 Some Data Is Already at the Right Level of Granularity 678 Pivoting a Regular Time Series 679 Aggregating Time-Stamped Transactions 680 Dealing with Missing Values 685 Missing Values in Source Data 685 Unknown or Non-Existent? 687 What Not to Do 687 Things to Consider 689 Lessons Learned 691 Chapter 19 Derived Variables: Making the Data Mean More 693 Handset Churn Rate as a Predictor of Churn 694 Single-Variable Transformations 696 Standardizing Numeric Variables 696 Turning Numeric Values into Percentiles 697 Turning Counts into Rates 698 Relative Measures 699 Replacing Categorical Variables with Numeric Ones 700 Combining Variables 707 Classic Combinations 707 Combining Highly Correlated Variables 710 Rent to Home Value 712 Extracting Features from Time Series 718 Trend 719 Seasonality 721 Extracting Features from Geography 722 Geocoding 722 Mapping 723 Using Geography to Create Relative Measures 724 Using Past Values of the Target Variable 725 Using Model Scores as Inputs 725 Handling Sparse Data 726 Account Set Patterns 726 Binning Sparse Values 727 Capturing Customer Behavior from Transactions 727 Widening Narrow Data 728 Sphere of Influence as a Predictor of Good Customers 728 An Example: Ratings to Rater Profile 730 Sample Fields from the Rater Signature 730 The Rating Signature and Derived Variables 732 Lessons Learned 733 Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735 Problems with Too Many Variables 736 Risk of Correlation Among Input Variables 736 Risk of Overfitting 738 The Sparse Data Problem 738 Visualizing Sparseness 739 Independence 740 Exhaustive Feature Selection 743 Flavors of Variable Reduction Techniques 744 Using the Target 744 Original versus New Variables 744 Sequential Selection of Features 745 The Traditional Forward Selection Methodology 745 Forward Selection Using a Validation Set 747 Stepwise Selection 748 Forward Selection Using Non-Regression Techniques 748 Backward Selection 748 Undirected Forward Selection 749 Other Directed Variable Selection Methods 749 Using Decision Trees to Select Variables 750 Variable Reduction Using Neural Networks 752 Principal Components 753 What Are Principal Components? 753 Principal Components Example 758 Principal Component Analysis 763 Factor Analysis 767 Variable Clustering 768 Example of Variable Clusters 768 Using Variable Clusters 770 Hierarchical Variable Clustering 770 Divisive Variable Clustering 773 Lessons Learned 774 Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775 What Is Text Mining? 776 Text Mining for Derived Columns 776 Beyond Derived Features 777 Text Analysis Applications 778 Working with Text Data 781 Sources of Text 781 Language Effects 782 Basic Approaches to Representing Documents 783 Representing Documents in Practice 784 Documents and the Corpus 786 Case Study: Ad Hoc Text Mining 786 The Boycott 787 Business as Usual 787 Combining Text Mining and Hypothesis Testing 787 The Results 788 Classifying News Stories Using MBR 789 What Are the Codes? 789 Applying MBR 790 The Results 793 From Text to Numbers 794 Starting with a “Bag of Words” 794 Term-Document Matrix 796 Corpus Effects 797 Singular Value Decomposition (SVD) 798 Text Mining and Naïve Bayesian Models 800 Naïve Bayesian in the Text World 801 Identifying Spam Using Naïve Bayesian 801 Sentiment Analysis 806 DIRECTV: A Case Study in Customer Service 809 Background 809 Applying Text Mining 811 Taking the Technical Approach 814 Not an Iterative Process 818 Continuing to Benefit 818 Lessons Learned 819 Index 821
£37.05
John Wiley & Sons Inc Graphical Models
Book SynopsisGraphical models are of increasing importance in applied statistics, and in particular in data mining. Providing a self-contained introduction and overview to learning relational, probabilistic, and possibilistic networks from data, this second edition of Graphical Models is thoroughly updated to include the latest research in this burgeoning field, including a new chapter on visualization. The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.Trade Review“The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.” (Zentralblatt Math, 1 August 2013) "All of the necessary background is provided, with material on modeling under uncertainty and imprecision modeling, decomposition of distributions, graphical representation of distributions, applications relating to graphical models, and problems for further research." (Book News, December 2009)Table of ContentsPreface. 1 Introduction. 1.1 Data and Knowledge. 1.2 Knowledge Discovery and Data Mining. 1.3 Graphical Models. 1.4 Outline of this Book. 2 Imprecision and Uncertainty. 2.1 Modeling Inferences. 2.2 Imprecision and Relational Algebra. 2.3 Uncertainty and Probability Theory. 2.4 Possibility Theory and the Context Model. 3 Decomposition. 3.1 Decomposition and Reasoning. 3.2 Relational Decomposition. 3.3 Probabilistic Decomposition. 3.4 Possibilistic Decomposition. 3.5 Possibility versus Probability. 4 Graphical Representation. 4.1 Conditional Independence Graphs. 4.2 Evidence Propagation in Graphs. 5 Computing Projections. 5.1 Databases of Sample Cases. 5.2 Relational and Sum Projections. 5.3 Expectation Maximization. 5.4 Maximum Projections. 6 Naive Classifiers. 6.1 Naive Bayes Classifiers. 6.2 A Naive Possibilistic Classifier. 6.3 Classifier Simplification. 6.4 Experimental Evaluation. 7 Learning Global Structure. 7.1 Principles of Learning Global Structure. 7.2 Evaluation Measures. 7.3 Search Methods. 7.4 Experimental Evaluation. 8 Learning Local Structure. 8.1 Local Network Structure. 8.2 Learning Local Structure. 8.3 Experimental Evaluation. 9 Inductive Causation. 9.1 Correlation and Causation. 9.2 Causal and Probabilistic Structure. 9.3 Faithfulness and Latent Variables. 9.4 The Inductive Causation Algorithm. 9.5 Critique of the Underlying Assumptions. 9.6 Evaluation. 10 Visualization. 10.1 Potentials. 10.2 Association Rules. 11 Applications. 11.1 Diagnosis of Electrical Circuits. 11.2 Application in Telecommunications. 11.3 Application at Volkswagen. 11.4 Application at DaimlerChrysler. A Proofs of Theorems. A.1 Proof of Theorem 4.1.2. A.2 Proof of Theorem 4.1.18. A.3 Proof of Theorem 4.1.20. A.4 Proof of Theorem 4.1.26. A.5 Proof of Theorem 4.1.28. A.6 Proof of Theorem 4.1.30. A.7 Proof of Theorem 4.1.31. A.8 Proof of Theorem 5.4.8. A.9 Proof of Lemma .2.2. A.10 Proof of Lemma .2.4. A.11 Proof of Lemma .2.6. A.12 Proof of Theorem 7.3.1. A.13 Proof of Theorem 7.3.2. A.14 Proof of Theorem 7.3.3. A.15 Proof of Theorem 7.3.5. A.16 Proof of Theorem 7.3.7. B Software Tools. Bibliography. Index.
£97.95
John Wiley & Sons Inc Data Mining Techniques in CRM Inside Customer
Book SynopsisThis is an applied handbook for the application of data mining techniques in the CRM framework. It combines a technical and a business perspective to cover the needs of business users who are looking for a practical guide on data mining.Trade Review"The book is written in a language that is easily accessible to business users who are not fluent in statistical methods and who have no prior exposure to the data mining or customer segmentation domain . . . This book is poised to become a standard reference, and I unconditionally recommend it to anyone working in this field." (Computing Reviews, 23 June 2011) "This is an excellent book for any data miner or anybody involved in CRM. The text is clear and pictures are well done and funny which is rare enough to be mentioned. From basic to advanced topics, the book is a very pleasant journey inside data mining with a clear focus on customer segmentation. Really advised if you're not a fan of formulas." (Data Mining Research, 18 March 2011)Table of ContentsAcknowledgements. 1. Data Mining in CRM. The CRM Strategy. What Can Data Mining Do? The Data Mining Methodology. Data Mining and Business Domain Expertise. Summary. 2. An Overview of Data Mining Techniques. Supervised Modeling. Unsupervised Modeling Techniques. Machine Learning/Artificial Intelligence vs. Statistical Techniques. Summary. 3. Data Mining Techniques for Segmentation. Segmenting Customers with Data Mining Techniques. Principal Components Analysis. Clustering Techniques. Examining and Evaluating the Cluster Solution. Understanding the Clusters through Profiling. Selecting the Optimal Cluster Solution. Cluster Profiling and Scoring with Supervised Models. An Introduction to Decision Tree Models. Summary. 4. The Mining Data Mart. Designing the Mining Data Mart. The Time Frame Covered by the Mining Data Mart. The Mining Data Mart for Retail Banking. The Mining Data Mart for Mobile Telephony Consumer (Residential) Customers. The Mining Data Mart for Retailers. Summary. 5. Customer Segmentation. An Introduction to Customer Segmentation. Segmentation Types in Consumer Markets. Segmentation in Business Markets. A Guide for Behavioral Segmentation. Segmentation Management Strategy. A Guide for Value-Based Segmentation. Designing Differentiated Strategies for the Value Segments. Summary. 6. Segmentation Applications in Banking. Segmentation for Credit Card Holders. Segmentation in Retail Banking. The Marketing Process. Segmentation in Retail Banking; A Summary. 7. Segmentation Applications in Telecommunications. Mobile Telephony. The Fixed Telephony Case. Summary. 8. Segmentation for Retailers. Segmentation in the Retail Industry. The RFM Analysis. Grouping Customers According to the Products They Buy. Summary. Further Reading. Index.
£64.55