Data mining Books

317 products


  • Springer-Verlag GmbH Neural Information Processing

    1 in stock

    1 in stock

    £71.99

  • Springer-Verlag GmbH Neural Information Processing

    1 in stock

    1 in stock

    £71.99

  • Springer-Verlag GmbH Neural Information Processing

    1 in stock

    1 in stock

    £71.99

  • Springer-Verlag GmbH Neural Information Processing

    1 in stock

    1 in stock

    £71.99

  • Springer Neural Information Processing

    1 in stock

    Book Synopsis

    1 in stock

    £61.74

  • Springer Nature Switzerland AG Neural Information Processing

    1 in stock

    Book Synopsis

    1 in stock

    £71.99

  • Springer Nature Switzerland AG Neural Information Processing

    1 in stock

    Book Synopsis

    1 in stock

    £71.99

  • Data Science: 9th International Conference of

    Springer Verlag, Singapore Data Science: 9th International Conference of

    1 in stock

    Book SynopsisThis two-volume set (CCIS 1879 and 1880) constitutes the refereed proceedings of the 9th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2023 held in Harbin, China, during September 22–24, 2023. The 52 full papers and 14 short papers presented in these two volumes were carefully reviewed and selected from 244 submissions. The papers are organized in the following topical sections:Part I: Applications of Data Science, Big Data Management and Applications, Big Data Mining and Knowledge Management, Data Visualization, Data-driven Security, Infrastructure for Data Science, Machine Learning for Data Science and Multimedia Data Management and Analysis.Part II: Data-driven Healthcare, Data-driven Smart City/Planet, Social Media and Recommendation Systems and Education using big data, intelligent computing or data mining, etc.Table of Contents​Applications of Data Science.- Construction of Software Design and Programming Practice Course in Information and Communication Engineering.- A Self-Attention-Based Stock Prediction Method Using Long Short-Term Memory Network Architecture.- CAD-based Research on the Design of a Standard Unit Cabinet for Custom Furniture of the Cabinet Type.- An Improved War Strategy Optimization Algorithm for Big Data Analytics.- Research on Path Planning of Mobile Robots Based on Dyna-RQ.- Small Target Helmet Wearing Detection Algorithm based on Improved YOLO V5.- Research on Dance Evaluation Technology based on Human Posture Recognition.- Multiple-channel Weight-based CNN Fault Diagnosis Method.- Big Data Management and Applications.- Design and Implementation of Key-Value Database for Ship Virtual Test Platform Based on Distributed System.- Big Data Mining and Knowledge Management.- Research on Multi-modal Time Series Data Prediction Method Based on Dualstage Attention Mechanism.- Prediction of Time Series Data with Low Latitude Features.- Lightweight and Efficient Attention-based Superresolution Generative Adversarial Networks.- The Multisource Time Series Data Granularity Conversion Method.- Outlier Detection Model Based on Autoencoder and Data Augmentation for High-Dimensional Sparse Data.- Dimension Reduction Based on Sampling.- Complex Time Series Analysis Based on Conditional Random Fields.- Feature Extraction of Time Series Data Based on CNNCBAM.- Optimization of a Network Topology Generation Algorithm based on Spatial Information Network.- Data Visualization.- MBTIviz: A Visualization System for Research on Psycho-demographics and Personality.- Data-driven Security.- Distributed Implementation of SM4 Block Cipher Algorithm based on SPDZ Secure Multi-party Computation Protocol.- DP-ASSGD: Differential Privacy Protection Based on Stochastic Gradient Descent Optimization.- Study on Tourism Workers ’ Intercultural Communication Competence.- A Novel Federated Learning with Bidirectional Adaptive Differential Privacy.- Chaos-Based Construction of LWEs in Lattice-Based Cryptosystems.- Security Compressed Sensing Image Encryption Algorithm Based on Elliptic Curve.- Infrastructure for Data Science.- Two-dimensional Code Transmission System Based on Side Channel Feedback.- An Updatable and Revocable Decentralized Identity Management Scheme based on Blockchain.- Cloud-Edge Intelligent Collaborative Computing Model based on Transfer Learning in IoT.- Design and Validation of a Hardware-in-the-loop based Automated Driving Simulation Test Platform.- Machine Learning for Data Science.- Improving Transferability Reversible Adversarial Examples based on Flipping Transformation.- Rolling Iterative Prediction for Correlated Multivariate Time Series.- Multimedia Data Management and Analysis.- Video Popularity Prediction Based on Knowledge Graph and LSTM Network.- Design and Implementation of Speech Generation and Demonstration Research Based on Deep Learning.- Testing and Improvement of OCR Recognition Technology in Export-OrientedChinese Dictionary APP.

    1 in stock

    £71.24

  • Neural Information Processing: 30th International

    Springer Verlag, Singapore Neural Information Processing: 30th International

    3 in stock

    Book SynopsisThe six-volume set LNCS 14447 until 14452 constitutes the refereed proceedings of the 30th International Conference on Neural Information Processing, ICONIP 2023, held in Changsha, China, in November 2023. The 652 papers presented in the proceedings set were carefully reviewed and selected from 1274 submissions. They focus on theory and algorithms, cognitive neurosciences; human centred computing; applications in neuroscience, neural networks, deep learning, and related fields. Table of ContentsText to Image Generation with Conformer-GAN.- MGFNet: A Multi-Granularity Feature Fusion and Mining Network for Visible-Infrared Person Re-Identification.- Isomorphic Dual-Branch Network for Non-homogeneous Image Dehazing and Super-Resolution.- Hi-Stega : A Hierarchical Linguistic Steganography Framework Combining Retrieval and Generation.- Effi-Seg: Rethinking EfficientNet Architecture for Real-time Semantic Segmentation.- Quantum Autoencoder Frameworks for Network Anomaly Detection.- Spatially-Aware Human-Object Interaction Detection with Cross-Modal Enhancement.- Intelligent trajectory tracking control of unmanned parafoil system based on SAC optimized LADRC.- CATS: Connection-aware and Interaction-based Text Steganalysis in Social Networks.- Syntax Tree Constrained Graph Network for Visual Question Answering.- CKR-Calibrator: Convolution Kernel Robustness Evaluation and Calibration.- SGLP-Net: Sparse Graph Label Propagation Network for Weakly-Supervised Temporal Action Localization.- VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment.- Spiking Reinforcement Learning for Weakly-supervised Anomaly Detection.- Resource-aware DNN Partitioning for Privacy-sensitive Edge-Cloud Systems.- A frequency reconfigurable multi-mode printed antenna.- Multi-view Contrastive learning for Knowledge-aware Recommendation.- PYGC: a PinYin Language Model Guided Correction Model for Chinese Spell Checking.- Empirical Analysis of Multi-label Classification on GitterCom using BERT.- A lightweight safety helmet detection network based on bidirectional connection module and Polarized Self-Attention.- Direct Inter-Intra View Association for Light Field Super-Resolution.- Responsive CPG-Based Locomotion Control for Quadruped Robots.- Vessel Behavior Anomaly Detection using Graph Attention Network.- TASFormer: Task-aware Image Segmentation Transformer.- Unsupervised Joint-Semantics Autoencoder Hashing for Multimedia Retrieval.- TKGR-RHETNE:A New Temporal Knowledge Graph Reasoning Model via Jointly Modeling Relevant Historical Event and Temporal Neighborhood Event Context.- High-Resolution Self-Attention with Fair Loss for Point Cloud Segmentation.- Transformer-based Video Deinterlacing Method.- SCME: A Self-Contrastive Method for Data-free and Query-Limited Model Extraction Attack.- CSEC: A Chinese Semantic Error Correction Dataset for Written Correction.- Contrastive Kernel Subspace Clustering.- UATR: An Uncertainty Aware Two-stage Refinement Model for Targeted Sentiment Analysis.- AttIN: Paying More Attention to Neighborhood Information for Entity Typing in Knowledge Graphs.- Text-based Person Re-ID by Saliency Mask and Dynamic Label Smoothing.- Robust Multi-view Spectral Clustering with Auto-encoder for Preserving Information.- Learnable Color Image Zero-Watermarking Based on Feature Comparison.- P-IoU: Accurate Motion Prediction based Data Association for Multi-Object Tracking.- WCA-VFnet:a dedicated complex forest smoke fire detector.- Label Selection Algorithm Based on Ant Colony Optimization and Reinforcement Learning for Multi-label Classification.- Reversible Data Hiding Based on Adaptive Embedding with Local Complexity.- Generalized Category Discovery with Clustering Assignment Consistency.- CInvISP: Conditional Invertible Image Signal Processing Pipeline.- Ignored Details in Eyes: Exposing GAN-generated Faces by Sclera.- A Developer Recommendation Method Based on Disentangled.- Graph Convolutional Network.- Novel Method for Radar Echo Target Detection.

    3 in stock

    £66.49

  • Taylor & Francis Ltd Big Data Mining and Analytics Components of Strategic Decision Making

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £54.14

  • Taylor & Francis Ltd Data Mining Mobile Devices

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £54.14

  • Taylor & Francis Ltd InternetScale Pattern Recognition New Techniques for Voluminous Data Sets and Data Clouds

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £56.99

  • Taylor & Francis Ltd Data Mining for Bioinformatics

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £56.99

  • Taylor & Francis Ltd Clustering A Data Recovery Approach Second Edition

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £58.89

  • Taylor & Francis Ltd A Practitioners Guide to Resampling for Data Analysis Data Mining and Modeling

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £47.99

  • Taylor & Francis Ltd Data Mining in Biomedical Imaging Signaling and Systems

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £54.14

  • Taylor & Francis Ltd Quality Aspects in Spatial Data Mining

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £58.89

  • Taylor & Francis Ltd Data Analytics Applications in Gaming and

    15 in stock

    Book SynopsisThe last decade has witnessed the rise of big data in game development as the increasing proliferation of Internet-enabled gaming devices has made it easier than ever before to collect large amounts of player-related data. At the same time, the emergence of new business models and the diversification of the player base have exposed a broader potential audience, which attaches great importance to being able to tailor game experiences to a wide range of preferences and skill levels. This, in turn, has led to a growing interest in data mining techniques, as they offer new opportunities for deriving actionable insights to inform game design, to ensure customer satisfaction, to maximize revenues, and to drive technical innovation. By now, data mining and analytics have become vital components of game development. The amount of work being done in this area nowadays makes this an ideal time to put together a book on this subject.Data Analytics Applications in Gaming andTable of ContentsPart 1 – Introduction to game data mining. Part 2 – Data mining for games user research. Part 3 – Data mining for game technology.Part 4 – Visualization of large-scale game data.

    15 in stock

    £42.74

  • Cambridge University Press The Text Mining Handbook

    15 in stock

    Book SynopsisPresents a comprehensive discussion of the state-of-the-art in text mining and link detection. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, the book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches, ending with real-world, mission-critical applications.Trade Review' … buy the book. This book is definitely worth having in your book shelf as a handy reference.' IAPR NewsletterTable of Contents1. Introduction to text mining; 2. Core text mining operations; 3. Text mining preprocessing techniques; 4. Categorization; 5. Clustering; 6. Information extraction; 7. Probabilistic models for Information extraction; 8. Preprocessing applications using probabilistic and hybrid approaches; 9. Presentation-layer considerations for browsing and query refinement; 10. Visualization approaches; 11. Link analysis; 12. Text mining applications; Appendix; Bibliography.

    15 in stock

    £71.24

  • Cambridge University Press Genomes Browsers and Databases DataMining Tools for Integrated Genomic Databases

    15 in stock

    a huge range and FREE tracked UK delivery on ALL orders.

    15 in stock

    £78.85

  • Cambridge University Press Darkweb Cyber Threat Intelligence Mining

    15 in stock

    Book SynopsisThis book examines cyber threat intelligence obtained from the center of the malicious hacking underworld - the dark web. It studies these communities both qualitatively and quantitatively, leveraging techniques from data mining, machine learning and AI, and offering insights to both cybersecurity practitioners and researchers.Trade Review'Darkweb Cyber Threat Intelligence Mining represents a tipping point in cyber security. It is a must-read for anyone involved in the modern cyber struggle.' George Cybenko, Dartmouth College, New Hampshire, from the Foreword'The book is well written and well structured. The authors provide interesting facts on the darknet economy, its community, and its underling rules, such as trust-based platforms and the related problems of its participants.' Steffen Wendzel, Computing ReviewsTable of Contents1. Introduction; 2. Moving to proactive cyber threat intelligence; 3. Understanding darkweb malicious hacker forums; 4. Automatic mining of cyber intelligence from the dark web; 5. Analyzing products and vendors in malicious hacking markets; 6. Using game theory for threat intelligence; 7. Application – protecting industrial control systems; 8. Conclusion – the future of darkweb cyber threat intelligence.

    15 in stock

    £56.99

  • Everybody Lies

    HarperCollins Publishers Inc Everybody Lies

    10 in stock

    Book Synopsis

    10 in stock

    £24.79

  • Targeted

    HarperCollins Publishers Inc Targeted

    10 in stock

    Book Synopsis

    10 in stock

    £23.19

  • Targeted  La Dictadura de Los Datos Spanish

    HarperCollins Publishers Inc Targeted La Dictadura de Los Datos Spanish

    10 in stock

    Book SynopsisLa apasionante historia de Cambridge Analytica y el Big Data. ¿Está realmente a salvo nuestra democracia tras la victoria de Trump? La dictadura de los datos revela cómo han utilizado nuestros datos y nos advierte cómo podrían volver a hacerlo. Saben lo que compras.   Brittany Kaiser, una novata asesora política especializada en Derechos Humanos y Relaciones Internacionales, creía que los datos recogidos y analizados por los smartphones y las redes sociales estaban en buenas manos hasta que conoció a Alexander Nix, el carismático líder de una nueva empresa de comunicación política llamada Cambridge Analytica. Lo que empezó siendo sólo un puesto de trabajo, pronto se convierte en una operación infame con el objetivo de ayudar a la elección de Trump o interferir en el referéndum que dio paso al Brexit. 

    10 in stock

    £15.29

  • Handbook of Statistical Analysis and Data Mining

    Elsevier Science Publishing Co Inc Handbook of Statistical Analysis and Data Mining

    Book SynopsisTrade Review"Data mining practitioners, here is your bible, the complete "driver's manual" for data mining. From starting the engine to handling the curves, this book covers the gamut of data mining techniques - including predictive analytics and text mining - illustrating how to achieve maximal value across business, scientific, engineering, and medical applications. What are the best practices through each phase of a data mining project? How can you avoid the most treacherous pitfalls? The answers are in here. "Going beyond its responsibility as a reference book, the heavily-updated second edition also provides all-new, detailed tutorials with step-by-step instructions to drive established data mining software tools across real world applications. This way, newcomers start their engines immediately and experience hands-on success. "What's more, this edition drills down on hot topics across seven new chapters, including deep learning and how to avert "b---s---" results. If you want to roll-up your sleeves and execute on predictive analytics, this is your definite, go-to resource. To put it lightly, if this book isn't on your shelf, you're not a data miner." --Eric Siegel, Ph.D., founder of Predictive Analytics World and author of "Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die" "Great introduction to the real-world process of data mining. The overviews, practical advice, tutorials, and extra CD material make this book an invaluable resource for both new and experienced data miners." --Karl Rexer, PhD (President and Founder of Rexer Analytics, Boston, Massachusetts)Table of ContentsPart 1: History Of Phases Of Data Analysis, Basic Theory, And The Data Mining Process 1. The Background for Data Mining Practice 2. Theoretical Considerations for Data Mining 3. The Data Mining and Predictive Analytic Process 4. Data Understanding and Preparation 5. Feature Selection 6. Accessory Tools for Doing Data Mining Part 2: The Algorithms And Methods In Data Mining And Predictive Analytics And Some Domain Areas 7. Basic Algorithms for Data Mining: A Brief Overview 8. Advanced Algorithms for Data Mining 9. Classification 10. Numerical Prediction 11. Model Evaluation and Enhancement 12. Predictive Analytics for Population Health and Care 13. Big Data in Education: New Efficiencies for Recruitment, Learning, and Retention of Students and Donors 14. Customer Response Modeling 15. Fraud Detection Part 3: Tutorials And Case Studies Tutorial A Example of Data Mining Recipes Using Windows 10 and Statistica 13 Tutorial B Using the Statistica Data Mining Workspace Method for Analysis of Hurricane Data (Hurrdata.sta) Tutorial C Case Study—Using SPSS Modeler and STATISTICA to Predict Student Success at High-Stakes Nursing Examinations (NCLEX) Tutorial D Constructing a Histogram in KNIME Using MidWest Company Personality Data Tutorial E Feature Selection in KNIME Tutorial F Medical/Business Tutorial Tutorial G A KNIME Exercise, Using Alzheimer’s Training Data of Tutorial F Tutorial H Data Prep 1-1: Merging Data Sources Tutorial I Data Prep 1–2: Data Description Tutorial J Data Prep 2-1: Data Cleaning and Recoding Tutorial K Data Prep 2-2: Dummy Coding Category Variables Tutorial L Data Prep 2-3: Outlier Handling Tutorial M Data Prep 3-1: Filling Missing Values With Constants Tutorial N Data Prep 3-2: Filling Missing Values With Formulas Tutorial O Data Prep 3-3: Filling Missing Values With a Model Tutorial P City of Chicago Crime Map: A Case Study Predicting Certain Kinds of Crime Using Statistica Data Miner and Text Miner Tutorial Q Using Customer Churn Data to Develop and Select a Best Predictive Model for Client Defection Using STATISTICA Data Miner 13 64-bit for Windows 10 Tutorial R Example With C&RT to Predict and Display Possible Structural Relationships Tutorial S Clinical Psychology: Making Decisions About Best Therapy for a Client Part 4: Model Ensembles, Model Complexity; Using the Right Model for the Right Use, Significance, Ethics, and the Future, and Advanced Processes 16. The Apparent Paradox of Complexity in Ensemble Modeling 17. The "Right Model" for the "Right Purpose": When Less Is Good Enough 18. A Data Preparation Cookbook 19. Deep Learning 20. Significance versus Luck in the Age of Mining: The Issues of P-Value "Significance" and "Ways to Test Significance of Our Predictive Analytic Models" 21. Ethics and Data Analytics 22. IBM Watson

    £75.04

  • Foundations of Data Intensive Applications

    John Wiley & Sons Inc Foundations of Data Intensive Applications

    10 in stock

    Book SynopsisPEEK UNDER THE HOOD OF BIG DATA ANALYTICS The world of big data analytics grows ever more complex. And while many people can work superficially with specific frameworks, far fewer understand the fundamental principles of large-scale, distributed data processing systems and how they operate. In Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood, renowned big-data experts and computer scientists Drs. Supun Kamburugamuve and Saliya Ekanayake deliver a practical guide to applying the principles of big data to software development for optimal performance. The authors discuss foundational components of large-scale data systems and walk readers through the major software design decisions that define performance, application type, and usability. You???ll learn how to recognize problems in your applications resulting in performance and distributed operation issues, diagnose them, and effectively eliminate them by relying on the bedrock big data principles explained within. Moving beyond individual frameworks and APIs for data processing, this book unlocks the theoretical ideas that operate under the hood of every big data processing system. Ideal for data scientists, data architects, dev-ops engineers, and developers, Foundations of Data Intensive Applications: Large Scale Data Analytics under the Hood shows readers how to: Identify the foundations of large-scale, distributed data processing systemsMake major software design decisions that optimize performanceDiagnose performance problems and distributed operation issuesUnderstand state-of-the-art research in big dataExplain and use the major big data frameworks and understand what underpins themUse big data analytics in the real world to solve practical problemsTable of ContentsIntroduction xxvii Chapter 1 Data Intensive Applications 1 Anatomy of a Data-Intensive Application 1 A Histogram Example 2 Program 2 Process Management 3 Communication 4 Execution 5 Data Structures 6 Putting It Together 6 Application 6 Resource Management 6 Messaging 7 Data Structures 7 Tasks and Execution 8 Fault Tolerance 8 Remote Execution 8 Parallel Applications 9 Serial Applications 9 Lloyd’s K-Means Algorithm 9 Parallelizing Algorithms 11 Decomposition 11 Task Assignment 12 Orchestration 12 Mapping 13 K-Means Algorithm 13 Parallel and Distributed Computing 15 Memory Abstractions 16 Shared Memory 16 Distributed Memory 18 Hybrid (Shared + Distributed) Memory 20 Partitioned Global Address Space Memory 21 Application Classes and Frameworks 22 Parallel Interaction Patterns 22 Pleasingly Parallel 23 Dataflow 23 Iterative 23 Irregular 23 Data Abstractions 24 Data-Intensive Frameworks 24 Components 24 Workflows 25 An Example 25 What Makes It Difficult? 26 Developing Applications 27 Concurrency 27 Data Partitioning 28 Debugging 28 Diverse Environments 28 Computer Networks 29 Synchronization 29 Thread Synchronization 29 Data Synchronization 30 Ordering of Events 31 Faults 31 Consensus 31 Summary 32 References 32 Chapter 2 Data and Storage 35 Storage Systems 35 Storage for Distributed Systems 36 Direct-Attached Storage 37 Storage Area Network 37 Network-Attached Storage 38 DAS or SAN or NAS? 38 Storage Abstractions 39 Block Storage 39 File Systems 40 Object Storage 41 Data Formats 41 XML 42 JSON 43 CSV 44 Apache Parquet 45 Apache Avro 47 Avro Data Definitions (Schema) 48 Code Generation 49 Without Code Generation 49 Avro File 49 Schema Evolution 49 Protocol Buffers, Flat Buffers, and Thrift 50 Data Replication 51 Synchronous and Asynchronous Replication 52 Single-Leader and Multileader Replication 52 Data Locality 53 Disadvantages of Replication 54 Data Partitioning 54 Vertical Partitioning 55 Horizontal Partitioning (Sharding) 55 Hybrid Partitioning 56 Considerations for Partitioning 57 NoSQL Databases 58 Data Models 58 Key-Value Databases 58 Document Databases 59 Wide Column Databases 59 Graph Databases 59 CAP Theorem 60 Message Queuing 61 Message Processing Guarantees 63 Durability of Messages 64 Acknowledgments 64 Storage First Brokers and Transient Brokers 65 Summary 66 References 66 Chapter 3 Computing Resources 69 A Demonstration 71 Computer Clusters 72 Anatomy of a Computer Cluster 73 Data Analytics in Clusters 74 Dedicated Clusters 76 Classic Parallel Systems 76 Big Data Systems 77 Shared Clusters 79 OpenMPI on a Slurm Cluster 79 Spark on a Yarn Cluster 80 Distributed Application Life Cycle 80 Life Cycle Steps 80 Step 1: Preparation of the Job Package 81 Step 2: Resource Acquisition 81 Step 3: Distributing the Application (Job) Artifacts 81 Step 4: Bootstrapping the Distributed Environment 82 Step 5: Monitoring 82 Step 6: Termination 83 Computing Resources 83 Data Centers 83 Physical Machines 85 Network 85 Virtual Machines 87 Containers 87 Processor, Random Access Memory, and Cache 88 Cache 89 Multiple Processors in a Computer 90 Nonuniform Memory Access 90 Uniform Memory Access 91 Hard Disk 92 GPUs 92 Mapping Resources to Applications 92 Cluster Resource Managers 93 Kubernetes 94 Kubernetes Architecture 94 Kubernetes Application Concepts 96 Data-Intensive Applications on Kubernetes 96 Slurm 98 Yarn 99 Job Scheduling 99 Scheduling Policy 101 Objective Functions 101 Throughput and Latency 101 Priorities 102 Lowering Distance Among the Processes 102 Data Locality 102 Completion Deadline 102 Algorithms 103 First in First Out 103 Gang Scheduling 103 List Scheduling 103 Backfill Scheduling 104 Summary 104 References 104 Chapter 4 Data Structures 107 Virtual Memory 108 Paging and TLB 109 Cache 111 The Need for Data Structures 112 Cache and Memory Layout 112 Memory Fragmentation 114 Data Transfer 115 Data Transfer Between Frameworks 115 Cross-Language Data Transfer 115 Object and Text Data 116 Serialization 116 Vectors and Matrices 117 1D Vectors 118 Matrices 118 Row-Major and Column-Major Formats 119 N-Dimensional Arrays/Tensors 122 NumPy 123 Memory Representation 125 K-means with NumPy 126 Sparse Matrices 127 Table 128 Table Formats 129 Column Data Format 129 Row Data Format 130 Apache Arrow 130 Arrow Data Format 131 Primitive Types 131 Variable-Length Data 132 Arrow Serialization 133 Arrow Example 133 Pandas DataFrame 134 Column vs. Row Tables 136 Summary 136 References 136 Chapter 5 Programming Models 139 Introduction 139 Parallel Programming Models 140 Parallel Process Interaction 140 Problem Decomposition 140 Data Structures 140 Data Structures and Operations 141 Data Types 141 Local Operations 143 Distributed Operations 143 Array 144 Tensor 145 Indexing 145 Slicing 146 Broadcasting 146 Table 146 Graph Data 148 Message Passing Model 150 Model 151 Message Passing Frameworks 151 Message Passing Interface 151 Bulk Synchronous Parallel 153 K-Means 154 Distributed Data Model 157 Eager Model 157 Dataflow Model 158 Data Frames, Datasets, and Tables 159 Input and Output 160 Task Graphs (Dataflow Graphs) 160 Model 161 User Program to Task Graph 161 Tasks and Functions 162 Source Task 162 Compute Task 163 Implicit vs. Explicit Parallel Models 163 Remote Execution 163 Components 164 Batch Dataflow 165 Data Abstractions 165 Table Abstraction 165 Matrix/Tensors 165 Functions 166 Source 166 Compute 167 Sink 168 An Example 168 Caching State 169 Evaluation Strategy 170 Lazy Evaluation 171 Eager Evaluation 171 Iterative Computations 172 DOALL Parallel 172 DOACROSS Parallel 172 Pipeline Parallel 173 Task Graph Models for Iterative Computations 173 K-Means Algorithm 174 Streaming Dataflow 176 Data Abstractions 177 Streams 177 Distributed Operations 178 Streaming Functions 178 Sources 178 Compute 179 Sink 179 An Example 179 Windowing 180 Windowing Strategies 181 Operations on Windows 182 Handling Late Events 182 SQL 182 Queries 183 Summary 184 References 184 Chapter 6 Messaging 187 Network Services 188 TCP/IP 188 RDMA 189 Messaging for Data Analytics 189 Anatomy of a Message 190 Data Packing 190 Protocol 191 Message Types 192 Control Messages 192 External Data Sources 192 Data Transfer Messages 192 Distributed Operations 194 How Are They Used? 194 Task Graph 194 Parallel Processes 195 Anatomy of a Distributed Operation 198 Data Abstractions 198 Distributed Operation API 198 Streaming and Batch Operations 199 Streaming Operations 199 Batch Operations 199 Distributed Operations on Arrays 200 Broadcast 200 Reduce and AllReduce 201 Gather and AllGather 202 Scatter 203 AllToAll 204 Optimized Operations 204 Broadcast 205 Reduce 206 AllReduce 206 Gather and AllGather Collective Algorithms 208 Scatter and AllToAll Collective Algorithms 208 Distributed Operations on Tables 209 Shuffle 209 Partitioning Data 211 Handling Large Data 212 Fetch-Based Algorithm (Asynchronous Algorithm) 213 Distributed Synchronization Algorithm 214 GroupBy 214 Aggregate 215 Join 216 Join Algorithms 219 Distributed Joins 221 Performance of Joins 223 More Operations 223 Advanced Topics 224 Data Packing 224 Memory Considerations 224 Message Coalescing 224 Compression 225 Stragglers 225 Nonblocking vs. Blocking Operations 225 Blocking Operations 226 Nonblocking Operations 226 Summary 227 References 227 Chapter 7 Parallel Tasks 229 CPUs 229 Cache 229 False Sharing 230 Vectorization 231 Threads and Processes 234 Concurrency and Parallelism 234 Context Switches and Scheduling 234 Mutual Exclusion 235 User-Level Threads 236 Process Affinity 236 NUMA-Aware Programming 237 Accelerators 237 Task Execution 238 Scheduling 240 Static Scheduling 240 Dynamic Scheduling 240 Loosely Synchronous and Asynchronous Execution 241 Loosely Synchronous Parallel System 242 Asynchronous Parallel System (Fully Distributed) 243 Actor Model 244 Actor 244 Asynchronous Messages 244 Actor Frameworks 245 Execution Models 245 Process Model 246 Thread Model 246 Remote Execution 246 Tasks for Data Analytics 248 SPMD and MPMD Execution 248 Batch Tasks 249 Data Partitions 249 Operations 251 Task Graph Scheduling 253 Threads, CPU Cores, and Partitions 254 Data Locality 255 Execution 257 Streaming Execution 257 State 257 Immutable Data 258 State in Driver 258 Distributed State 259 Streaming Tasks 259 Streams and Data Partitioning 260 Partitions 260 Operations 261 Scheduling 262 Uniform Resources 263 Resource-Aware Scheduling 264 Execution 264 Dynamic Scaling 264 Back Pressure (Flow Control) 265 Rate-Based Flow Control 266 Credit-Based Flow Control 266 State 267 Summary 268 References 268 Chapter 8 Case Studies 271 Apache Hadoop 271 Programming Model 272 Architecture 274 Cluster Resource Management 275 Apache Spark 275 Programming Model 275 RDD API 276 SQL, DataFrames, and DataSets 277 Architecture 278 Resource Managers 278 Task Schedulers 279 Executors 279 Communication Operations 280 Apache Spark Streaming 280 Apache Storm 282 Programming Model 282 Architecture 284 Cluster Resource Managers 285 Communication Operations 286 Kafka Streams 286 Programming Model 286 Architecture 287 PyTorch 288 Programming Model 288 Execution 292 Cylon 295 Programming Model 296 Architecture 296 Execution 297 Communication Operations 298 Rapids cuDF 298 Programming Model 298 Architecture 299 Summary 300 References 300 Chapter 9 Fault Tolerance 303 Dependable Systems and Failures 303 Fault Tolerance is Not Free 304 Dependable Systems 305 Failures 306 Process Failures 306 Network Failures 307 Node Failures 307 Byzantine Faults 307 Failure Models 308 Failure Detection 308 Recovering from Faults 309 Recovery Methods 310 Stateless Programs 310 Batch Systems 311 Streaming Systems 311 Processing Guarantees 311 Role of Cluster Resource Managers 312 Checkpointing 313 State 313 Consistent Global State 313 Uncoordinated Checkpointing 314 Coordinated Checkpointing 315 Chandy-Lamport Algorithm 315 Batch Systems 316 When to Checkpoint? 317 Snapshot Data 318 Streaming Systems 319 Case Study: Apache Storm 319 Message Tracking 320 Failure Recovery 321 Case Study: Apache Flink 321 Checkpointing 322 Failure Recovery 324 Batch Systems 324 Iterative Programs 324 Case Study: Apache Spark 325 RDD Recomputing 326 Checkpointing 326 Recovery from Failures 327 Summary 327 References 327 Chapter 10 Performance and Productivity 329 Performance Metrics 329 System Performance Metrics 330 Parallel Performance Metrics 330 Speedup 330 Strong Scaling 331 Weak Scaling 332 Parallel Efficiency 332 Amdahl’s Law 333 Gustafson’s Law 334 Throughput 334 Latency 335 Benchmarks 336 LINPACK Benchmark 336 NAS Parallel Benchmark 336 BigDataBench 336 TPC Benchmarks 337 HiBench 337 Performance Factors 337 Memory 337 Execution 338 Distributed Operators 338 Disk I/O 339 Garbage Collection 339 Finding Issues 342 Serial Programs 342 Profiling 342 Scaling 343 Strong Scaling 343 Weak Scaling 344 Debugging Distributed Applications 344 Programming Languages 345 C/C++ 346 Java 346 Memory Management 347 Data Structures 348 Interfacing with Python 348 Python 350 C/C++ Code integration 350 Productivity 351 Choice of Frameworks 351 Operating Environment 353 CPUs and GPUs 353 Public Clouds 355 Future of Data-Intensive Applications 358 Summary 358 References 359 Index 361

    10 in stock

    £38.25

  • Data Analytics in the AWS Cloud

    John Wiley & Sons Inc Data Analytics in the AWS Cloud

    10 in stock

    Book SynopsisA comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you'll explore every relevant aspect of data analyticsfrom data engineering to analysis, business intelligence, DevOps, and MLOpsas you discover how to integrate machine learning predictions with analytics engines and visualization tools. You'll also find: Real-world use cases of AWS architectures that demystify the applications of data analytics Accessible introductions to data acquisition, importation, storage, visualization, and reporting Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify mTable of ContentsIntroduction xxiii Chapter 1 AWS Data Lakes and Analytics Technology Overview 1 Why AWS? 1 What Does a Data Lake Look Like in AWS? 2 Analytics on AWS 3 Skills Required to Build and Maintain an AWS Analytics Pipeline 3 Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team 5 The Data Vision 6 Support 6 DA Team Roles 7 Early Stage Roles 7 Team Lead 8 Data Architect 8 Data Engineer 8 Data Analyst 9 Maturity Stage Roles 9 Data Scientist 9 Cloud Engineer 10 Business Intelligence (BI) Developer 10 Machine Learning Engineer 10 Business Analyst 11 Niche Roles 11 Analytics Flow at a Process Level 12 Workflow Methodology 12 The DA Team Mantra: “Automate Everything” 14 Analytics Models in the Wild: Centralized, Distributed, Center of Excellence 15 Centralized 15 Distributed 16 Center of Excellence 16 Summary 17 Chapter 3 Working on AWS 19 Accessing AWS 20 Everything Is a Resource 21 S3: An Important Exception 21 IAM: Policies, Roles, and Users 22 Policies 22 Identity- Based Policies 24 Resource- Based Policies 25 Roles 25 Users and User Groups 25 Summarizing IAM 26 Working with the Web Console 26 The AWS Command- Line Interface 29 Installing AWS cli 29 Linux Installation 30 macOS Installation 30 Windows 31 Configuring AWS cli 31 A Note on Region 33 Setting Individual Parameters 33 Using Profiles and Configuration Files 33 Final Notes on Configuration 36 Using the AWS cli 36 Using Skeletons and File Inputs 39 Cleaning Up! 43 Infrastructure- as- Code: CloudFormation and Terraform 44 CloudFormation 44 CloudFormation Stacks 46 CloudFormation Template Anatomy 47 CloudFormation Changesets 52 Getting Stack Information 55 Cleaning Up Again 57 CloudFormation Conclusions 58 Terraform 58 Coding Style 58 Modularity 59 Limitations 59 Terraform vs. CloudFormation 60 Infrastructure- as- Code: CDK, Pulumi, Cloudcraft, and Other Solutions 60 AWS CDK 60 Pulumi 62 Cloudcraft 62 Infrastructure Management Conclusions 63 Chapter 4 Serverless Computing and Data Engineering 65 Serverless vs. Fully Managed 65 AWS Serverless Technologies 66 AWS Lambda 67 Pricing Model 67 Laser Focus on Code 68 The Lambda Paradigm Shift 69 Virtually Infinite Scalability 70 Geographical Distribution 70 A Lambda Hello World 71 Lambda Configuration 74 Runtime 74 Container- Based Lambdas 75 Architectures 75 Memory 75 Networking 76 Execution Role 76 Environment Variables 76 AWS EventBridge 77 AWS Fargate 77 AWS DynamoDB 77 AWS SNS 77 Amazon SQS 78 AWS CloudWatch 78 Amazon QuickSight 78 AWS Step Functions 78 Amazon API Gateway 79 Amazon Cognito 79 AWS Serverless Application Model (SAM) 79 Ephemeral Infrastructure 80 AWS SAM Installation 80 Configuration 80 Creating Your First AWS SAM Project 81 Application Structure 83 SAM Resource Types 85 SAM Lambda Template 86 !! Recursive Lambda Invocation !! 88 Function Metadata 88 Outputs 89 Implicitly Generated Resources 89 Other Template Sections 90 Lambda Code 90 Building Your First SAM Application 93 Testing the AWS SAM Application Locally 96 Deployment 99 Cleaning Up 104 Summary 104 Chapter 5 Data Ingestion 105 AWS Data Lake Architecture 106 Serverless Data Lake Architecture Structure 106 Ingestion 106 Storage and Processing 108 Cataloging, Governance, and Search 108 Security and Monitoring 109 Consumption 109 Sample Processing Architecture: Cataloging Images into DynamoDB 109 Use Case Description 109 SAM Application Creation 110 S3- Triggered Lambda 111 Adding DynamoDB 119 Lambda Execution Context 121 Inserting into DynamoDB 121 Cleaning Up 123 Serverless Ingestion 124 AWS Fargate 124 AWS Lambda 124 Example Architecture: Fargate- Based Periodic Batch Import 125 The Basic Importer 125 ECS CLI 128 AWS Copilot cli 128 Clean Up 136 AWS Kinesis Ingestion 136 Example Architecture: Two- Pronged Delivery 137 Fully Managed Ingestion with AppFlow 146 Operational Data Ingestion with Database Migration Service 151 DMS Concepts 151 DMS Instance 151 DMS Endpoints 152 DMS Tasks 152 Summary of the Workflow 152 Common Use of DMS 153 Example Architecture: DMS to S3 154 DMS Instance 154 DMS Endpoints 156 DMS Task 162 Summary 167 Chapter 6 Processing Data 169 Phases of Data Preparation 170 What Is ETL? Why Should I Care? 170 ETL Job vs. Streaming Job 171 Overview of ETL in AWS 172 ETL with AWS Glue 172 ETL with Lambda Functions 172 ETL with Hadoop/EMR 173 Other Ways to Perform ETL 173 ETL Job Design Concepts 173 Source Identification 174 Destination Identification 174 Mappings 174 Validation 174 Filter 175 Join, Denormalization, Relationalization 175 AWS Glue for ETL 176 Really, It’s Just Spark 176 Visual 176 Spark Script Editor 177 Python Shell Script Editor 177 Jupyter Notebook 177 Connectors 177 Creating Connections 178 Creating Connections with the Web Console 178 Creating Connections with the AWS cli 179 Creating ETL Jobs with AWS Glue Visual Editor 184 ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) 184 Job Bookmarks 187 Transformations 188 Apply Mapping 189 Filter 189 Other Available Transforms 190 Run the Edited Job 191 Visual Editor with Source and Target Conclusions 192 Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) 192 Creating ETL Jobs with the Spark Script Editor 192 Developing ETL Jobs with AWS Glue Notebooks 193 What Is a Notebook? 194 Notebook Structure 194 Step 1: Load Code into a DynamicFrame 196 Step 2: Apply Field Mapping 197 Step 3: Apply the Filter 197 Step 4: Write to S3 in Parquet Format 198 Example: Joining and Denormalizing Data from Two S3 Locations 199 Conclusions for Manually Authored Jobs with Notebooks 203 Creating ETL Jobs with AWS Glue Interactive Sessions 204 It’s Magic 205 Development Workflow 206 Streaming Jobs 207 Differences with a Standard ETL Job 208 Streaming Sources 208 Example: Process Kinesis Streams with a Streaming Job 208 Streaming ETL Jobs Conclusions 217 Summary 217 Chapter 7 Cataloging, Governance, and Search 219 Cataloging with AWS Glue 219 AWS Glue and the AWS Glue Data Catalog 219 Glue Databases and Tables 220 Databases 220 The Idea of Schema- on- Read 221 Tables 222 Create Table Manually 223 Creating a Table from an Existing Schema 225 Creating a Table with a Crawler 225 Summary on Databases and Tables 226 Crawlers 226 Updating or Not Updating? 230 Running the Crawler 231 Creating a Crawler from the AWS CLI 231 Retrieving Table Information from the CLI 233 Classifiers 235 Classifier Example 236 Crawlers and Classifiers Summary 237 Search with Amazon Athena: The Heart of Analytics in AWS 238 A Bit of History 238 Interface Overview 238 Creating Tables Manually 239 Athena Data Types 240 Complex Types 241 Running a Query 242 Connecting with JDBC and ODBC 243 Query Stats 243 Recent Queries and Saved Queries 243 The Power of Partitions 244 Athena Pricing Model 244 Automatic Naming 245 Athena Query Output 246 Athena Peculiarities (SQL and Not) 246 Computed Fields Gotcha and WITH Statement Workaround 246 Lowercase! 247 Query Explain 248 Deduplicating Records 249 Working with JSON, Flattening, and Unnesting 250 Athena Views 251 Create Table as Select (CTAS) 252 Saving Queries and Reusing Saved Queries 253 Running Parameterized Queries 254 Athena Federated Queries 254 Athena Lambda Connectors 255 Note on Connection Errors 256 Performing Federated Queries 257 Creating a View from a Federated Query 258 Governing: Athena Workgroups, Lake Formation, and More 258 Athena Workgroups 259 Fine- Grained Athena Access with IAM 262 Recap of Athena- Based Governance 264 AWS Lake Formation 265 Registering a Location in Lake Formation 266 Creating a Database in Lake Formation 268 Assigning Permissions in Lake Formation 269 LF- Tags and Permissions in Lake Formation 271 Data Filters 277 Governance Conclusions 279 Summary 280 Chapter 8 Data Consumption: BI, Visualization, and Reporting 283 QuickSight 283 Signing Up for QuickSight 284 Standard Plan 284 Enterprise Plan 284 Users and User Groups 285 Managing Users and Groups 285 Managing QuickSight 286 Users and Groups 287 Your Subscriptions 287 SPICE Capacity 287 Account Settings 287 Security and Permissions 287 VPC Connections 288 Mobile Settings 289 Domains and Embedding 289 Single Sign- On 289 Data Sources and Datasets 289 Creating an Athena Data Source 291 Creating Other Data Sources 292 Creating a Data Source from the AWS cli 292 Creating a Dataset from a Table 294 Creating a Dataset from a SQL Query 295 Duplicating Datasets 296 Note on Creating Datasets 297 QuickSight Favorites, Recent, and Folders 297 SPICE 298 Manage SPICE Capacity 298 Refresh Schedule 299 QuickSight Data Editor 299 QuickSight Data Types 302 Change Data Types 302 Calculated Fields 303 Joining Data 305 Excluding Fields 309 Filtering Data 309 Removing Data 310 Geospatial Hierarchies and Adding Fields to Hierarchies 310 Unsupported Format Dates 311 Visualizing Data: QuickSight Analysis 312 Adding a Title and a Description to Your Analysis 313 Renaming the Sheet 314 Your First Visual with AutoGraph 314 Field Wells 314 Visuals Types 315 Saving and Autosaving 316 A First Example: Pie Chart 316 Renaming a Visual 317 Filtering Data 318 Adding Drill- Downs 320 Parameters 321 Actions 324 Insights 328 ML- Powered Insights 330 Sharing an Analysis 335 Dashboards 335 Dashboard Layouts and Themes 335 Publishing a Dashboard 336 Embedding Visuals and Dashboards 337 Data Consumption: Not Only Dashboards 337 Summary 338 Chapter 9 Machine Learning at Scale 339 Machine Learning and Artificial Intelligence 339 What Are ML/AI Use Cases? 340 Types of ML Models 340 Overview of ML/AI AWS Solutions 341 Amazon SageMaker 341 SageMaker Domains 342 Adding a User to the Domain 344 SageMaker Studio 344 SageMaker Example Notebook 346 Step 1: Prerequisites and Preprocessing 346 Step 2: Data Ingestion 347 Step 3: Data Inspection 348 Step 4: Data Conversion 349 Step 5: Upload Training Data 349 Step 6: Train the Model 349 Step 7: Set Up Hosting and Deploy the Model 351 Step 8: Validate the Model 352 Step 9: Use the Model 353 Inference 353 Real Time 354 Asynchronous 354 Serverless 354 Batch Transform 354 Data Wrangler 356 SageMaker Canvas 357 Summary 358 Appendix Example Data Architectures in AWS 359 Modern Data Lake Architecture 360 ETL in a Lake House 361 Consuming Data in the Lake House 361 The Modern Data Lake Architecture 362 Batch Processing 362 Stream Processing 363 Architecture Design Recommendations 364 Automate Everything 365 Build on Events 365 Performance = Cost Savings 365 AWS Glue Catalog and Athena- Centric Workflow 365 Design Flexible 365 Pick Your Battles 365 Parquet 366 Summary 366 Index 367

    10 in stock

    £40.38

  • Semi-Supervised and Unsupervised Machine

    ISTE Ltd and John Wiley & Sons Inc Semi-Supervised and Unsupervised Machine

    10 in stock

    Book SynopsisThis book provides a detailed and up-to-date overview on classification and data mining methods. The first part is focused on supervised classification algorithms and their applications, including recent research on the combination of classifiers. The second part deals with unsupervised data mining and knowledge discovery, with special attention to text mining. Discovering the underlying structure on a data set has been a key research topic associated to unsupervised techniques with multiple applications and challenges, from web-content mining to the inference of cancer subtypes in genomic microarray data. Among those, the book focuses on a new application for dialog systems which can be thereby made adaptable and portable to different domains. Clustering evaluation metrics and new approaches, such as the ensembles of clustering algorithms, are also described.Table of ContentsPART 1. STATE OF THE ART 1 Chapter 1. Introduction 3 1.1. Organization of the book 6 1.2. Utterance corpus 8 1.3. Datasets from the UCI repository10 1.4. Microarray dataset 13 1.5. Simulated datasets 14 Chapter 2. State of the Art in Clustering and Semi-Supervised Techniques 15 2.1. Introduction 15 2.2. Unsupervised machine learning (clustering) 15 2.3. A brief history of cluster analysis 16 2.4. Cluster algorithms 19 2.5. Applications of cluster analysis 52 2.6. Evaluation methods 77 2.7. Internal cluster evaluation 77 2.8. External cluster validation 80 2.9. Semi-supervised learning 84 2.10. Summary 88 PART 2. APPROACHES TO SEMI-SUPERVISED CLASSIFICATION 91 Chapter 3. Semi-Supervised Classification Using Prior Word Clustering 93 3.1. Introduction 93 3.2. Dataset 94 3.3. Utterance classification scheme 94 3.4. Semi-supervised approach based on term clustering 98 3.5. Disambiguation 113 3.6. Summary 124 Chapter 4. Semi-Supervised Classification Using Pattern Clustering 127 4.1. Introduction 127 4.2. New semi-supervised algorithm using the cluster and label strategy 128 4.3. Optimum cluster labeling 132 4.4. Supervised classification block 154 4.5. Datasets 159 4.6. An analysis of the bounds for the cluster and label approaches 162 4.7. Extension through cluster pruning 164 4.8. Simulations and results 173 4.9. Summary 179 PART 3 . CONTRIBUTIONS TO UNSUPERVISED CLASSIFICATION – ALGORITHMS TO DETECT THE OPTIMAL NUMBER OF CLUSTERS 183 Chapter 5. Detection of the Number of Clusters through Non-Parametric Clustering Algorithms 185 5.1. Introduction 185 5.2. New hierarchical pole-based clustering algorithm 186 5.3. Evaluation 190 5.4. Datasets 192 5.5. Summary 197 Chapter 6. Detecting the Number of Clusters through Cluster Validation 199 6.1. Introduction 199 6.2. Cluster validation methods 201 6.3. Combination approach based on quantiles 206 6.4. Datasets 212 6.5. Results 214 6.6. Application of speech utterances 223 6.7. Summary 224 Bibliography 227 Index 243

    10 in stock

    £132.00

  • 10 in stock

    £85.49

© 2026 Book Curl

    • American Express
    • Apple Pay
    • Diners Club
    • Discover
    • Google Pay
    • Maestro
    • Mastercard
    • PayPal
    • Shop Pay
    • Union Pay
    • Visa

    Login

    Forgot your password?

    Don't have an account yet?
    Create account