Data mining Books

267 products


  • Trustworthy Online Controlled Experiments

    Cambridge University Press Trustworthy Online Controlled Experiments

    1 in stock

    Book SynopsisGetting numbers is easy; getting trustworthy numbers is hard. From experimentation leaders at Amazon, Google, LinkedIn, and Microsoft, this guide to accelerating innovation using A/B tests includes practical examples, pitfalls, and advice for students and industry professionals, plus deeper dives into advanced topics for experienced practitioners.Trade Review'At the core of the Lean Methodology is the scientific method: Creating hypotheses, running experiments, gathering data, extracting insight and validation or modification of the hypothesis. A/B testing is the gold standard of creating verifiable and repeatable experiments, and this book is its definitive text.' Steve Blank, Adjunct professor at Stanford University, father of modern entrepreneurship, author of The Startup Owner's Manual and The Four Steps to the Epiphany'This book is a great resource for executives, leaders, researchers or engineers looking to use online controlled experiments to optimize product features, project efficiency or revenue. I know firsthand the impact that Kohavi's work had on Bing and Microsoft, and I'm excited that these learnings can now reach a wider audience.' Harry Shum, EVP, Microsoft Artificial Intelligence and Research Group'A great book that is both rigorous and accessible. Readers will learn how to bring trustworthy controlled experiments, which have revolutionized internet product development, to their organizations.' Adam D'Angelo, Co-founder and CEO of Quora and former CTO of Facebook'This book is a great overview of how several companies use online experimentation and A/B testing to improve their products. Kohavi, Tang and Xu have a wealth of experience and excellent advice to convey, so the book has lots of practical real world examples and lessons learned over many years of the application of these techniques at scale.' Jeff Dean, Google Senior Fellow and SVP Google Research'Do you want your organization to make consistently better decisions? This is the new bible of how to get from data to decisions in the digital age. Reading this book is like sitting in meetings inside Amazon, Google, LinkedIn, Microsoft. The authors expose for the first time the way the world's most successful companies make decisions. Beyond the admonitions and anecdotes of normal business books, this book shows what to do and how to do it well. It's the how-to manual for decision-making in the digital world, with dedicated sections for business leaders, engineers, and data analysts.' Scott Cook, Intuit Co-founder & Chairman of the Executive Committee'Online controlled experiments are powerful tools. Understanding how they work, what their strengths are, and how they can be optimized can illuminate both specialists and a wider audience. This book is the rare combination of technically authoritative, enjoyable to read, and dealing with highly important matters.' John P. A. Ioannidis, Stanford University'Kohavi, Tang, and Xu are pioneers of online experimentation. The platforms they've built and the experiments they've enabled have transformed some of the largest internet brands. Their research and talks have inspired teams across the industry to adopt experimentation. This book is the authoritative yet practical text that the industry has been waiting for.' Adil Aijaz, Co-founder and CEO, Split Software'Which online option will be better? We frequently need to make such choices, and frequently err. To determine what will actually work better, we need rigorous controlled experiments, aka A/B testing. This excellent and lively book by experts from Microsoft, Google, and LinkedIn presents the theory and best practices of A/B testing. A must read for anyone who does anything online!' Gregory Piatetsky-Shapiro, Ph.D., president of KDnuggets, co-founder of SIGKDD, and LinkedIn Top Voice on Data Science & Analytics'Ron Kohavi, Diane Tang and Ya Xu are the world's top experts on online experiments. I've been using their work for years and I'm delighted they have now teamed up to write the definitive guide. I recommend this book to all my students and everyone involved in online products and services.' Erik Brynjolfsson, Massachusetts Institute of Technology, co-author of The Second Machine Age'A modern software-supported business cannot compete successfully without online controlled experimentation. Written by three of the most experienced leaders in the field, this book presents the fundamental principles, illustrates them with compelling examples, and digs deeper to present a wealth of practical advice. It's a 'must read'! Foster Provost, New York University and co-author of the best-selling Data Science for Business'In the past two decades the technology industry has learned what scientists have known for centuries: that controlled experiments are among the best tools to understand complex phenomena and to solve very challenging problems. The ability to design controlled experiments, run them at scale, and interpret their results is the foundation of how modern high tech businesses operate. Between them the authors have designed and implemented several of the world's most powerful experimentation platforms. This book is a great opportunity to learn from their experiences about how to use these tools and techniques.' Kevin Scott, EVP and CTO of Microsoft'Online experiments have fueled the success of Amazon, Microsoft, LinkedIn and other leading digital companies. This practical book gives the reader rare access to decades of experimentation experience at these companies and should be on the bookshelf of every data scientist, software engineer and product manager.' Stefan Thomke, William Barclay Harding Professor, Harvard Business School, author of Experimentation Works: The Surprising Power of Business Experiments'The secret sauce for a successful online business is experimentation. But it is a secret no longer. Here three masters of the art describe the ABCs of A/B testing so that you too can continuously improve your online services.' Hal Varian, Chief Economist, Google, and author of Intermediate Microeconomics: A Modern Approach'Experiments are the best tool for online products and services. This book is full of practical knowledge derived from years of successful testing at Microsoft Google and LinkedIn. Insights and best practices are explained with real examples and pitfalls, their markers and solutions identified. I strongly recommend this book!' Preston McAfee, former Chief Economist and VP of Microsoft'Experimentation is the future of digital strategy and 'Trustworthy Experiments' will be its Bible. Kohavi, Tang and Xu are three of the most noteworthy experts on experimentation working today and their book delivers a truly practical roadmap for digital experimentation that is useful right out of the box. The revealing case studies they conducted over many decades at Microsoft, Amazon, Google and LinkedIn are organized into easy to understand practical lessens with tremendous depth and clarity. It should be required reading for any manager of a digital business.' Sinan Aral, David Austin Professor of Management, Massachusetts Institute of Technology, and author of The Hype MachineTable of ContentsPreface – how to read this book; 1. Introduction and motivation; 2. Running and analyzing experiments: an end-to-end example; 3. Twyman's law and experimentation trustworthiness; 4. Experimentation platform and culture; Part II: 5. Speed matters: an end-to-end case study; 6. Organizational metrics; 7. Metrics for experimentation and the Overall Evaluation Criterion (OEC); 8. Institutional memory and aeta-analysis; 9. Ethics in controlled experiments; Part III: 10. Complementary techniques; 11. Observational causal studies; Part IV: 12. Client-side experiments; 13. Instrumentation; 14. Choosing a randomization unit; 15. Ramping experiment exposure: trading off speed, quality, and risk; 16. Scaling experiment analyses; Part V: 17. The statistics behind online controlled experiments; 18. Variance estimation and improved sensitivity: pitfalls and solutions; 19. The A/A test; 20. Triggering for improved sensitivity; 21. Guardrail metrics; 22. Leakage and interference between variants; 23. Measuring long-term treatment effects.

    1 in stock

    £30.99

  • Python for Data Analysis 3e

    O'Reilly Media Python for Data Analysis 3e

    5 in stock

    Book SynopsisUpdated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively.

    5 in stock

    £51.19

  • The Science of Science

    Cambridge University Press The Science of Science

    15 in stock

    Book SynopsisThis is the first comprehensive overview of the exciting field of the 'science of science'. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists, policy makers, and administrators with an interest in the wider scientific enterprise.Trade Review'Wang and Barabási book is a manifesto for the science of science domain. Graduate students (as well as their mentors) owe the authors a debt of gratitude for this impressive synthesis of what is a fast-evolving field of research.' Pierre Azoulay, Massachusetts Institute of Technology'Analyzing quantitative aspects of science with state-of-art tools, Wang and Barabási have written an insightful and comprehensive book that will become a must-read for all scholars interested in science.' Yu Xie, Princeton University'In their engaging book, Wang and Barabási take a fresh look at the science of science. They convincingly argue that in the age of big data and AI applying the scientific method to science itself not only helps understand how science works but may even enhance it. We are compelled to consider the determinants of individual careers and what this means in the age of large-scale scientific collaborations. These and other questions around the meaning of scientific impact, in academia and beyond, make the book highly relevant to scientists, academic administrators and funders alike. By the time the final, forward-looking chapter ends we are hooked on all the correlations and predictions, and so it is only fitting that we are invited to join in, to help shape the field which is likely to be driven by a human-machine collaboration.' Magdalena Skipper, Nature'Overall, I found this book very stimulating. It made me wonder whether in-depth metrics analyses of 'only' the subjective narratives of authors, such as the references list they select, actually creates a foundation on which to form judgement rather than opinion? Namely, what fraction of these publications analysed for their metrics were actually underpinned by their data? As well as provoking thought, this book offers a feast of references, 424 in all. There are such further enticing reads as reference 396, Life3.0: Being Human in the Age of Artificial Intelligence. To conclude, I recommend this book for your library, and maybe even take it for your summer beach reading.' John R. Helliwell, Journal of Applied Crystallography'… a text that should appeal to practicing scientists curious about the structure of the whole scientific enterprise, academic administrators and policy makers interested in evidence-based decision-making, and researchers interested in contributing further to the "science of science." There is no better, handier, and more readable work to appeal to such audiences … Highly recommended.' M. Oromaner, Choice ConnectTable of ContentsIntroduction; Part I. The Science of Career: 1. Productivity of a scientist; 2. The H Index; 3. The Matthew Effect; 4. Age and Scientific Achievement; 5. Random Impact Rule; 6. The Q Factor; 7. Hot Streaks; Part II. The Science of Collaboration: 8. The increasing dominance of teams in science; 9. The Invisible College; 10. Coauthorship Networks; 11. Team Assembly; 12. Small and large teams; 13. Scientific Credit; 14. Credit Allocation; Part III. The Science of Impact: 15. Big Science; 16. Citation Disparity; 17. High Impact Papers; 18. Scientific Impact; 19. The Time Dimension of Science; 20. Ultimate Impact; Part IV. Outlook: 21. Can Science be Accelerated?; 22. Artificial Intelligence; 23. Bias and Causality in Science; Part V. Last thought; All the Science of Science: Appendix A1 Modeling team assembly; Appendix A2 Modeling Citations; References; Index.

    15 in stock

    £24.99

  • Learning SQL

    O'Reilly Media Learning SQL

    2 in stock

    Book SynopsisAs data floods into your company, you need to put it to work right away-and SQL is the best tool for the job. With the latest edition of this introductory guide, author Alan Beaulieu helps developers get up to speed with SQL fundamentals for writing database applications, performing administrative tasks, and generating reports.

    2 in stock

    £42.39

  • Fundamentals of Data Engineering

    O'Reilly Media Fundamentals of Data Engineering

    10 in stock

    Book SynopsisWith this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle.

    10 in stock

    £51.19

  • Everybody Lies

    HarperCollins Publishers Inc Everybody Lies

    1 in stock

    Book Synopsis New York Times BestsellerForeword by Steven Pinker,...

    1 in stock

    £21.74

  • Fusion Strategy

    Harvard Business Review Press Fusion Strategy

    1 in stock

    Book SynopsisTwo world-renowned experts on innovation and digital strategy explore how real-time data and AI will radically transform physical products—and the companies that make them.Tech giants like Facebook, Amazon, and Google can collect real-time data from billions of users. For companies that design and manufacture physical products, that type of fluid, data-rich information used to be a pipe dream. Now, with the rise of cheap and powerful sensors, supercomputing, and artificial intelligence, things are changing—fast.In Fusion Strategy, world-renowned innovation guru Vijay Govindarajan and digital strategy expert Venkat Venkatraman offer a first-of-its-kind playbook that will help industrial companies combine what they do best—create physical products—with what digitals do best—use algorithms and AI to parse expansive, interconnected datasets—to make strategic connections that would otherwise be impossible.The laws of

    1 in stock

    £23.75

  • The Elements of Statistical Learning Springer

    Springer-Verlag New York Inc. The Elements of Statistical Learning Springer

    15 in stock

    Book SynopsisOverview of Supervised Learning.- Linear Methods for Regression.- Linear Methods for Classification.- Basis Expansions and Regularization.- Kernel Smoothing Methods.- Model Assessment and Selection.- Model Inference and Averaging.- Additive Models, Trees, and Related Methods.- Boosting and Additive Trees.- Neural Networks.- Support Vector Machines and Flexible Discriminants.- Prototype Methods and Nearest-Neighbors.- Unsupervised Learning.- Random Forests.- Ensemble Learning.- Undirected Graphical Models.- High-Dimensional Problems: p ? N.Trade ReviewFrom the reviews:"Like the first edition, the current one is a welcome edition to researchers and academicians equally…. Almost all of the chapters are revised.… The Material is nicely reorganized and repackaged, with the general layout being the same as that of the first edition.… If you bought the first edition, I suggest that you buy the second editon for maximum effect, and if you haven’t, then I still strongly recommend you have this book at your desk. Is it a good investment, statistically speaking!" (Book Review Editor, Technometrics, August 2009, VOL. 51, NO. 3)From the reviews of the second edition:"This second edition pays tribute to the many developments in recent years in this field, and new material was added to several existing chapters as well as four new chapters … were included. … These additions make this book worthwhile to obtain … . In general this is a well written book which gives a good overview on statistical learning and can be recommended to everyone interested in this field. The book is so comprehensive that it offers material for several courses." (Klaus Nordhausen, International Statistical Review, Vol. 77 (3), 2009)“The second edition … features about 200 pages of substantial new additions in the form of four new chapters, as well as various complements to existing chapters. … the book may also be of interest to a theoretically inclined reader looking for an entry point to the area and wanting to get an initial understanding of which mathematical issues are relevant in relation to practice. … this is a welcome update to an already fine book, which will surely reinforce its status as a reference.” (Gilles Blanchard, Mathematical Reviews, Issue 2012 d)“The book would be ideal for statistics graduate students … . This book really is the standard in the field, referenced in most papers and books on the subject, and it is easy to see why. The book is very well written, with informative graphics on almost every other page. It looks great and inviting. You can flip the book open to any page, read a sentence or two and be hooked for the next hour or so.” (Peter Rabinovitch, The Mathematical Association of America, May, 2012)Table of ContentsIntroduction.- Overview of supervised learning.- Linear methods for regression.- Linear methods for classification.- Basis expansions and regularization.- Kernel smoothing methods.- Model assessment and selection.- Model inference and averaging.- Additive models, trees, and related methods.- Boosting and additive trees.- Neural networks.- Support vector machines and flexible discriminants.- Prototype methods and nearest-neighbors.- Unsupervised learning.

    15 in stock

    £58.49

  • Principles of Database Management

    Cambridge University Press Principles of Database Management

    15 in stock

    Book SynopsisThis comprehensive textbook teaches the fundamentals of database design, modeling, systems, data storage, and the evolving world of data warehousing, governance and more. Written by experienced educators and experts in big data, analytics, data quality, and data integration, it provides an up-to-date approach to database management. This full-color, illustrated text has a balanced theory-practice focus, covering essential topics, from established database technologies to recent trends, like Big Data, NoSQL, and more. Fundamental concepts are supported by real-world examples, query and code walkthroughs, and figures, making it perfect for introductory courses for advanced undergraduates and graduate students in information systems or computer science. These examples are further supported by an online playground with multiple learning environments, including MySQL, MongoDB, Neo4j Cypher, and tree structure visualization. This combined learning approach connects key concepts throughout the text to the important, practical tools to get started in database management.Trade Review'Although there have been a series of classical textbooks on database systems, the new dramatic advances call for an updated text covering the latest significant topics, such as big data analytics, No-SQL and much more. Fortunately, this is exactly what this book has to offer. It is highly desirable for training the next generation of data management professionals.' Jian Pei, Simon Fraser University, Canada'I haven't seen an as up-to-date and comprehensive textbook for Database Management as this one in many years. Principles of Database Management combines a number of classical and recent topics concerning Data Modeling, Relational Databases, Object-Oriented Databases, XML, Distributed Data Management, NoSQL and Big Data in an unprecedented manner. The authors did a great job in stitching these topics into one coherent and compelling story that will serve as an ideal basis for teaching both introductory and advanced courses.' Martin Theobald, University of Luxembourg'This is a very timely book with outstanding coverage of database topics and excellent treatment of database details. It not only gives very solid discussions of traditional topics like data modeling and relational databases but also contains refreshing contents on frontier topics such as XML databases, NoSQL databases, big data, and analytics. For those reasons, this will be a good book for database professionals who will keep using it for all stages of database studies and works.' J. Leon Zhao, City University of Hong Kong'This accessible, authoritative book introduces the reader the most important fundamental concepts of data management, while providing a practical view of recent advances. Both are essential for data professionals today.' Foster Provost, New York University, Stern School of Business'This guide to big and small data management addresses both fundamental principles and practical deployment. It reviews a range of databases and their relevance for analytics. The book is useful to practitioners because it contains many case studies, links to open-source software, and a very useful abstraction of analytics that will help them better choose solutions. It is important to academics because it promotes database principles which are key to successful and sustainable data science.' Sihem Amer-Yahia, Laboratoire d'Informatique de Grenoble and Editor-in-Chief the International Journal on Very Large DataBases'This book covers everything you will need to teach in a database implementation and design class. With some chapters covering big data, analytic models/methods, and No-SQL, it can keep our students up-to-date with these new technologies in data management related topics.' Han-fen Hu, University of Nevada, Las Vegas'As we are entering a new technological era of intelligent machines powered by data-driven algorithms, understanding fundamental concepts of data management and their most current practical applications has become more important than ever. This book is a timely guide for anyone interested in getting up to speed with the state of the art in database systems, big data technologies, and data science. It is full of insightful examples and case studies with direct industrial relevance.' Nesime Tatbul, Intel Labs and Massachusetts Institute of Technology'It is a pleasure to study this new book on database systems. The book offers a fantastically fresh approach to database teaching. The mix of theoretical and practical contents is almost perfect, the content is up-to-date and covers the recent ones, the examples are nice, and the database testbed provides an excellent way of understanding the concepts. Coupled with the authors 'expertise, this book is an important addition to the database field.' Arnab Bhattacharya, Indian Institute of Technology, Kanpur'Principles of Database Management is my favorite textbook for teaching a course on database management. Written in a well-illustrated style, this comprehensive book covers essential topics in established data management technologies and recent discoveries in data science. With a nice balance between theory and practice, it is not only an excellent teaching medium for students taking information management and/or data analytics courses, but also a quick and valuable reference for scientists and engineers working in this area.' Chuan Xiao, Graduate School of Informatics, Nagoya University'Data science success stories and big data applications are only possible because of advances in database technology. This book provides both a broad and deep introduction to databases. It covers the different types of database systems (from relational to noSQL) and manages to bridge the gap between data modeling and the underlying basic principles. The book is highly recommended for anyone that wants to understand how modern information systems deal with ever-growing volumes of data.' Wil van der Aalst, RWTH Aachen University'The database field has been evolving for several decades and the need for updated textbooks is continuous. Now, this need is covered by this fresh book by Lemahieu, van den Broucke and Baesens. It spans from traditional topics - such as the relational model and SQL - to more recent topics – such as distributed computing with Hadoop and Spark as well as data analytics. The book can be used as an introductory text and for graduate courses.' Yannis Manolopoulos, Data Science & Engineering Lab, Aristotle University of Thessaloniki'I like the way the book covers both traditional database topics and newer material such as big data, No-SQL databases, and data quality. The coverage is just right for my course and the level of the material is very appropriate for my students. The book also has clear explanations and good examples.' Barbara Klein, University of MichiganThis book provides a unique perspective on database management and how to store, manage, and analyze small and big data. The accompanying exercises and solutions, cases, slides, and YouTube lectures turn it into an indispensable resource for anyone teaching an undergraduate or postgraduate course on the topic.' Wolfgang Ketter, Erasmus University Rotterdam'This is a very modern textbook that fills the needs of current trends without sacrificing the need to cover the required database management systems fundamentals.' George Dimitoglou, Hood College, Maryland'This book is a much needed foundational piece on data management and data science. The authors successfully integrate the fields of database technology, operations research and big data analytics, which have often been covered independently in the past. A key asset is its didactical approach that builds on a rich set of industry examples and exercises. The book is a must-read for all scholars and practitioners interested in database management, big data analytics and its applications.' Jan Mendling, Institute for Information Business, ViennaTable of ContentsPreface; Part I. Databases and Database Design: 1. Fundamental concepts of database management; 2. Architecture and categorization of DBMSs; 3. Conceptual data modeling using the (E)ER model and UML class diagram; 4. Organizational aspects of data management; Part II. Types of Database Systems: 5. Legacy databases; 6. Relational databases: the relational model; 7. Relational databases: structured query language (SQL); 8. Object oriented databases and object persistence; 9. Extended relational databases; 10. XML databases; 11. NoSQL databases; Part III. Physical Data Storage, Transaction Management, and Database Access: 12. Physical file organization and indexing; 13. Physical database organization; 14. Basics of transaction management; 15. Accessing databases and database APIs; 16. Data distribution and distributed transaction management; Part IV. Data Warehousing, Data Governance and (Big) Data Analytics: 17. Data warehousing and business intelligence; 18. Data integration, data quality and data governance; 19. Big data; 20. Analytics; Appendix A. Cases and questions; Appendix B. Using the online environment; Appendix C. Answer key to select review questions; Glossary; Index.

    15 in stock

    £59.99

  • Data Quality Fundamentals

    O'Reilly Media Data Quality Fundamentals

    15 in stock

    Book SynopsisDo your product dashboards look funky? Are your quarterly reports stale? Is the dataset you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to any of the questions above, this book is for you.

    15 in stock

    £39.74

  • Neural Networks and Deep Learning

    Springer International Publishing AG Neural Networks and Deep Learning

    1 in stock

    Book SynopsisChapters 6 and 7 present radial-basis function (RBF) networks and restricted Boltzmann machines. Advanced topics in neural networks: Chapters 8, 9, and 10 discuss recurrent neural networks, convolutional neural networks, and graph neural networks.

    1 in stock

    £40.49

  • Computer Age Statistical Inference Student

    Cambridge University Press Computer Age Statistical Inference Student

    2 in stock

    Book SynopsisThe twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and influence. ''Data science'' and ''machine learning'' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? How does it all fit together? Now in paperback and fortified with exercises, this book delivers a concentrated course in modern statistical thinking. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov Chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. Each chapter ends with class-tested exercises, and the book concludes with speculation on the future direction of statistics and data science.Table of ContentsPart I. Classic Statistical Inference: 1. Algorithms and inference; 2. Frequentist inference; 3. Bayesian inference; 4. Fisherian inference and maximum likelihood estimation; 5. Parametric models and exponential families; Part II. Early Computer-Age Methods: 6. Empirical Bayes; 7. James–Stein estimation and ridge regression; 8. Generalized linear models and regression trees; 9. Survival analysis and the EM algorithm; 10. The jackknife and the bootstrap; 11. Bootstrap confidence intervals; 12. Cross-validation and Cp estimates of prediction error; 13. Objective Bayes inference and Markov chain Monte Carlo; 14. Statistical inference and methodology in the postwar era; Part III. Twenty-First-Century Topics: 15. Large-scale hypothesis testing and false-discovery rates; 16. Sparse modeling and the lasso; 17. Random forests and boosting; 18. Neural networks and deep learning; 19. Support-vector machines and kernel methods; 20. Inference after model selection; 21. Empirical Bayes estimation strategies; Epilogue; References; Author Index; Subject Index.

    2 in stock

    £30.99

  • Pandas for Everyone

    Pearson Education (US) Pandas for Everyone

    15 in stock

    Book SynopsisDaniel Chen is a graduate student in the Interdisciplinary PhD program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Polytechnic Institute and State University (Virginia Tech). He is involved with Software Carpentry as an instructor, Mentoring Committee Member, and currently serves as the Assessment Committee Chair. He completed his Masters in Public Health at Columbia University Mailman School of Public Health in Epidemiology with a certificate in Advanced Epidemiology and currently extending his Master's thesis work in the Social and Decision Analytics Laboratory under the Virginia Bioinformatics Institute on attitude diffusion in social networks.Table of ContentsForeword by Anne M. Brown xxiii Foreword by Jared Lander xxv Preface xxvii Changes in the Second Edition xxxix Part I: Introduction 1 Chapter 1. Pandas DataFrame Basics 3 Learning Objectives 3 1.1 Introduction 3 1.2 Load Your First Data Set 4 1.3 Look at Columns, Rows, and Cells 6 1.4 Grouped and Aggregated Calculations 23 1.5 Basic Plot 27 Conclusion 28 Chapter 2. Pandas Data Structures Basics 31 Learning Objectives 31 2.1 Create Your Own Data 31 2.2 The Series 33 2.3 The DataFrame 42 2.4 Making Changes to Series and DataFrames 45 2.5 Exporting and Importing Data 52 Conclusion 63 Chapter 3. Plotting Basics 65 Learning Objectives 65 3.1 Why Visualize Data? 65 3.2 Matplotlib Basics 66 3.3 Statistical Graphics Using matplotlib 72 3.4 Seaborn 78 3.5 Pandas Plotting Method 111 Conclusion 115 Chapter 4. Tidy Data 117 Learning Objectives 117 Note About This Chapter 117 4.1 Columns Contain Values, Not Variables 118 4.2 Columns Contain Multiple Variables 122 4.3 Variables in Both Rows and Columns 126 Conclusion 129 Chapter 5. Apply Functions 131 Learning Objectives 131 Note About This Chapter 131 5.1 Primer on Functions 131 5.2 Apply (Basics) 133 5.3 Vectorized Functions 138 5.4 Lambda Functions (Anonymous Functions) 141 Conclusion 142 Part II: Data Processing 143 Chapter 6. Data Assembly 145 Learning Objectives 145 6.1 Combine Data Sets 145 6.2 Concatenation 146 6.3 Observational Units Across Multiple Tables 154 6.4 Merge Multiple Data Sets 160 Conclusion 167 Chapter 7. Data Normalization 169 Learning Objectives 169 7.1 Multiple Observational Units in a Table (Normalization) 169 Conclusion 173 Chapter 8. Groupby Operations: Split-Apply-Combine 175 Learning Objectives 175 8.1 Aggregate 176 8.2 Transform 184 8.3 Filter 188 8.4 The pandas.core.groupby.DataFrameGroupBy object 190 8.5 Working with a MultiIndex 195 Conclusion 199 Part III: Data Types 203 Chapter 9. Missing Data 203 Learning Objectives 203 9.1 What Is a NaN Value? 203 9.2 Where Do Missing Values Come From? 205 9.3 Working with Missing Data 210 9.4 Pandas Built-In NA Missing 216 Conclusion 218 Chapter 10. Data Types 219 Learning Objectives 219 10.1 Data Types 219 10.2 Converting Types 220 10.3 Categorical Data 225 Conclusion 227 Chapter 11. Strings and Text Data 229 Introduction 229 Learning Objectives 229 11.1 Strings 229 11.2 String Methods 233 11.3 More String Methods 234 11.4 String Formatting (F-Strings) 236 11.5 Regular Expressions (RegEx) 239 11.6 The regex Library 247 Conclusion 247 Chapter 12. Dates and Times 249 Learning Objectives 249 12.1 Python's datetime Object 249 12.2 Converting to datetime 250 12.3 Loading Data That Include Dates 253 12.4 Extracting Date Components 254 12.5 Date Calculations and Timedeltas 257 12.6 Datetime Methods 259 12.7 Getting Stock Data 261 12.8 Subsetting Data Based on Dates 263 12.9 Date Ranges 266 12.10 Shifting Values 270 12.11 Resampling 276 12.12 Time Zones 278 12.13 Arrow for Better Dates and Times 280 Conclusion 280 Part IV: Data Modeling 281 Chapter 13. Linear Regression (Continuous Outcome Variable) 283 13.1 Simple Linear Regression 283 13.2 Multiple Regression 287 13.3 Models with Categorical Variables 289 13.4 One-Hot Encoding in scikit-learn with Transformer Pipelines 294 Conclusion 296 Chapter 14. Generalized Linear Models 297 About This Chapter 297 14.1 Logistic Regression (Binary Outcome Variable) 297 14.2 Poisson Regression (Count Outcome Variable) 304 14.3 More Generalized Linear Models 308 Conclusion 309 Chapter 15. Survival Analysis 311 15.1 Survival Data 311 15.2 Kaplan Meier Curves 312 15.3 Cox Proportional Hazard Model 314 Conclusion 317 Chapter 16. Model Diagnostics 319 16.1 Residuals 319 16.2 Comparing Multiple Models 324 16.3 k-Fold Cross-Validation 329 Conclusion 334 Chapter 17. Regularization 335 17.1 Why Regularize? 335 17.2 LASSO Regression 337 17.3 Ridge Regression 338 17.4 Elastic Net 340 17.5 Cross-Validation 341 Conclusion 343 Chapter 18. Clustering 345 18.1 k-Means 345 18.2 Hierarchical Clustering 351 Conclusion 356 Part V. Conclusion 357 Chapter 19. Life Outside of Pandas 359 19.1 The (Scientific) Computing Stack 359 19.2 Performance 360 19.3 Dask 360 19.4 Siuba 360 19.5 Ibis 361 19.6 Polars 361 19.7 PyJanitor 361 19.8 Pandera 361 19.9 Machine Learning 361 19.10 Publishing 362 19.11 Dashboards 362 Conclusion 362 Chapter 20. It's Dangerous To Go Alone! 363 20.1 Local Meetups 363 20.2 Conferences 363 20.3 The Carpentries 364 20.4 Podcasts 364 20.5 Other Resources 365 Conclusion 365 Appendices 367 A. Concept Maps 369B. Installation and Setup 373C. Command Line 377D. Project Templates 379E. Using Python 381F. Working Directories 383G. Environments 385H. Install Packages 389I. Importing Libraries 391J. Code Style 393K. Containers: Lists, Tuples, and Dictionaries 395L. Slice Values 399M. Loops 401N. Comprehensions 403O. Functions 405P. Ranges and Generators 409Q. Multiple Assignment 413R. Numpy ndarray 415S. Classes 417T. SettingWithCopyWarning 419U. Method Chaining 423V. Timing Code 427W. String Formatting 429X. Conditionals (if-elif-else) 433Y. New York ACS Logistic Regression Example 435Z. Replicating Results in R 443 Index 451

    15 in stock

    £34.19

  • Demand Forecasting Best Practices

    Manning Publications Demand Forecasting Best Practices

    1 in stock

    Book SynopsisMaster the demand forecasting skills you need to decide what resources to acquire, products to produce, and where and how to distribute them. For demand planners, S&OP managers, supply chain leaders, and data scientists. Demand Forecasting Best Practices is a unique step-by-step guide, demonstrating forecasting tools, metrics, and models alongside stakeholder management techniques that work in a live business environment. You will learn how to: Lead a demand planning team to improve forecasting quality while reducing workload Properly define the objectives, granularity, and horizon of your demand planning process Use smart, value-weighted KPIs to track accuracy and bias Spot areas of your process where there is room for improvement Help planners and stakeholders (sales, marketing, finances) add value to your process Identify what kind of data you should be collecting, and how Utilise different types of statistical and machine learning models Follow author Nicolas Vandeput's original five-step framework for demand planning excellence and learn how to tailor it to your own company's needs. You will learn how to optimise demand planning for a more effective supply chain and will soon be delivering accurate predictions that drive major business value. About the technology Demand forecasting is vital for the success of any product supply chain. It allows companies to make better decisions about what resources to acquire, what products to produce, and where and how to distribute them. As an effective demand forecaster, you can help your organisation avoid overproduction, reduce waste, and optimise inventory levels for a real competitive advantage.

    1 in stock

    £27.89

  • Making Sense of Data I

    John Wiley & Sons Inc Making Sense of Data I

    15 in stock

    Book SynopsisPraise for the First Edition . a well-written book on data analysis and data mining that provides an excellent foundation. CHOICE This is a must-read book for learning practical statistics and data analysis.Table of ContentsPREFACE ix 1 INTRODUCTION 1 1.1 Overview 1 1.2 Sources of Data 2 1.3 Process for Making Sense of Data 3 1.4 Overview of Book 13 1.5 Summary 16 Further Reading 16 2 DESCRIBING DATA 17 2.1 Overview 17 2.2 Observations and Variables 18 2.3 Types of Variables 20 2.4 Central Tendency 22 2.5 Distribution of the Data 24 2.6 Confidence Intervals 36 2.7 Hypothesis Tests 40 Exercises 42 Further Reading 45 3 PREPARING DATA TABLES 47 3.1 Overview 47 3.2 Cleaning the Data 48 3.3 Removing Observations and Variables 49 3.4 Generating Consistent Scales Across Variables 49 3.5 New Frequency Distribution 51 3.6 Converting Text to Numbers 52 3.7 Converting Continuous Data to Categories 53 3.8 Combining Variables 54 3.9 Generating Groups 54 3.10 Preparing Unstructured Data 55 Exercises 57 Further Reading 57 4 UNDERSTANDING RELATIONSHIPS 59 4.1 Overview 59 4.2 Visualizing Relationships Between Variables 60 4.3 Calculating Metrics About Relationships 69 Exercises 81 Further Reading 82 5 IDENTIFYING AND UNDERSTANDING GROUPS 83 5.1 Overview 83 5.2 Clustering 88 5.3 Association Rules 111 5.4 Learning Decision Trees from Data 122 Exercises 137 Further Reading 140 6 BUILDING MODELS FROM DATA 141 6.1 Overview 141 6.2 Linear Regression 149 6.3 Logistic Regression 161 6.4 k-Nearest Neighbors 167 6.5 Classification and Regression Trees 172 6.6 Other Approaches 178 Exercises 179 Further Reading 182 APPENDIX A ANSWERS TO EXERCISES 185 APPENDIX B HANDS-ON TUTORIALS 191 B.1 Tutorial Overview 191 B.2 Access and Installation 191 B.3 Software Overview 192 B.4 Reading in Data 193 B.5 Preparation Tools 195 B.6 Tables and Graph Tools 199 B.7 Statistics Tools 202 B.8 Grouping Tools 204 B.9 Models Tools 207 B.10 Apply Model 211 B.11 Exercises 211 BIBLIOGRAPHY 227 INDEX 231

    15 in stock

    £59.36

  • Snowflake  The Definitive Guide

    O'Reilly Media Snowflake The Definitive Guide

    5 in stock

    Book SynopsisSnowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users within an organization to make data-driven decisions. This clear, comprehensive guide will show you how to build integrated data applications and develop new revenue streams based on data.

    5 in stock

    £47.99

  • Data Visualization

    Princeton University Press Data Visualization

    15 in stock

    Book SynopsisTrade Review"[Healy’s] prose is engaging and chatty, and the style of instruction is unpretentious and practical . . . This single volume represents an excellent entry point for those wishing to upskill their abilities in data visualization."---Paul Cuffe, IEEE Transactions"Undoubtedly, this book is an excellent introduction to an essential tool for anyone who needs to collect and present data." * Conservation Biology *

    15 in stock

    £35.70

  • Leadership Strategies in the Age of Big Data

    Taylor & Francis Inc Leadership Strategies in the Age of Big Data

    1 in stock

    Book SynopsisHarnessing the power of technology is one of the key measures of effective leadership. Leadership Strategies in the Age of Big Data, Algorithms, and Analytics will help leaders think and act like strategists to maintain a leading-edge competitive advantage. Written by a leading expert in the field, this book provides new insights on how to successfully transition companies by aligning an organization's culture to accept the benefits of digital technology.The author emphasizes the importance of creating a team spirit with employees to embrace the digital age and develop strategic business plans that pinpoint new markets for growth, strengthen customer relationships, and develop competitive strategies. Understanding how to deal with inconsistencies when facts generated by data analytics disagree with your own experience, intuition, and knowledge of the competitive situation is key to successful leadership.Table of ContentsChapter 1. Developing Effective Leadership:The human interface with big data, algorithms, and analytics. Chapter 2. Initiate speed of implementation to maintain a digital advantage. Chapter 3. Apply analytics to concentrate at the decisive point for maximum impact. Chapter 4. Activate maneuver and indirect approach to create surprise. Chapter 5. Employ big data to determine the culminating point of a competitive campaign. Chapter 6. Use data to determine how long to maintain offensive action. Chapter 7. Align big data with the corporate culture. Chapter 8. Decide on a bold approach or cautious restraint based on data analytics. Chapter 9. Utilize big data, algorithms, and analytics to maximize use of competitor intelligence. Chapter 10. Choose offensive and defensive strategies by understanding the human interaction. Chapter 11. Factor-in friction and luck that make analytics a gamble. Chapter 12. Use data to neutralize the competitor’s effectiveness. Appendix. Strategic Business Plan outline.

    1 in stock

    £47.49

  • Artificial Intelligence Basics

    APress Artificial Intelligence Basics

    1 in stock

    Book SynopsisArtificial intelligence touches nearly every part of your day. While you may initially assume that technology such as smart speakers and digital assistants are the extent of it, AI has in fact rapidly become a general-purpose technology, reverberating across industries including transportation, healthcare, financial services, and many more. In our modern era, an understanding of AI and its possibilities for your organization is essential for growth and success.Artificial Intelligence Basics has arrived to equip you with a fundamental, timely grasp of AI and its impact. Author Tom Taulli provides an engaging, non-technical introduction to important concepts such as machine learning, deep learning, natural language processing (NLP), robotics, and more. In addition to guiding you through real-world case studies and practical implementation steps, Taulli uses his expertise to expand on the bigger questions that surround AI. These include societal trends, ethics, andTable of Contents

    1 in stock

    £35.99

  • Everybody Lies

    HarperCollins Publishers Inc Everybody Lies

    10 in stock

    Book Synopsis

    10 in stock

    £23.24

  • Targeted

    HarperCollins Publishers Inc Targeted

    10 in stock

    Book Synopsis

    10 in stock

    £23.19

  • Targeted  La Dictadura de Los Datos Spanish

    HarperCollins Publishers Inc Targeted La Dictadura de Los Datos Spanish

    10 in stock

    Book SynopsisLa apasionante historia de Cambridge Analytica y el Big Data. ¿Está realmente a salvo nuestra democracia tras la victoria de Trump? La dictadura de los datos revela cómo han utilizado nuestros datos y nos advierte cómo podrían volver a hacerlo. Saben lo que compras.   Brittany Kaiser, una novata asesora política especializada en Derechos Humanos y Relaciones Internacionales, creía que los datos recogidos y analizados por los smartphones y las redes sociales estaban en buenas manos hasta que conoció a Alexander Nix, el carismático líder de una nueva empresa de comunicación política llamada Cambridge Analytica. Lo que empezó siendo sólo un puesto de trabajo, pronto se convierte en una operación infame con el objetivo de ayudar a la elección de Trump o interferir en el referéndum que dio paso al Brexit. 

    10 in stock

    £15.29

  • Handbook of Statistical Analysis and Data Mining

    Elsevier Science Publishing Co Inc Handbook of Statistical Analysis and Data Mining

    15 in stock

    Book SynopsisTrade Review"Data mining practitioners, here is your bible, the complete "driver's manual" for data mining. From starting the engine to handling the curves, this book covers the gamut of data mining techniques - including predictive analytics and text mining - illustrating how to achieve maximal value across business, scientific, engineering, and medical applications. What are the best practices through each phase of a data mining project? How can you avoid the most treacherous pitfalls? The answers are in here. "Going beyond its responsibility as a reference book, the heavily-updated second edition also provides all-new, detailed tutorials with step-by-step instructions to drive established data mining software tools across real world applications. This way, newcomers start their engines immediately and experience hands-on success. "What's more, this edition drills down on hot topics across seven new chapters, including deep learning and how to avert "b---s---" results. If you want to roll-up your sleeves and execute on predictive analytics, this is your definite, go-to resource. To put it lightly, if this book isn't on your shelf, you're not a data miner." --Eric Siegel, Ph.D., founder of Predictive Analytics World and author of "Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die" "Great introduction to the real-world process of data mining. The overviews, practical advice, tutorials, and extra CD material make this book an invaluable resource for both new and experienced data miners." --Karl Rexer, PhD (President and Founder of Rexer Analytics, Boston, Massachusetts)Table of ContentsPart 1: History Of Phases Of Data Analysis, Basic Theory, And The Data Mining Process 1. The Background for Data Mining Practice 2. Theoretical Considerations for Data Mining 3. The Data Mining and Predictive Analytic Process 4. Data Understanding and Preparation 5. Feature Selection 6. Accessory Tools for Doing Data Mining Part 2: The Algorithms And Methods In Data Mining And Predictive Analytics And Some Domain Areas 7. Basic Algorithms for Data Mining: A Brief Overview 8. Advanced Algorithms for Data Mining 9. Classification 10. Numerical Prediction 11. Model Evaluation and Enhancement 12. Predictive Analytics for Population Health and Care 13. Big Data in Education: New Efficiencies for Recruitment, Learning, and Retention of Students and Donors 14. Customer Response Modeling 15. Fraud Detection Part 3: Tutorials And Case Studies Tutorial A Example of Data Mining Recipes Using Windows 10 and Statistica 13 Tutorial B Using the Statistica Data Mining Workspace Method for Analysis of Hurricane Data (Hurrdata.sta) Tutorial C Case Study—Using SPSS Modeler and STATISTICA to Predict Student Success at High-Stakes Nursing Examinations (NCLEX) Tutorial D Constructing a Histogram in KNIME Using MidWest Company Personality Data Tutorial E Feature Selection in KNIME Tutorial F Medical/Business Tutorial Tutorial G A KNIME Exercise, Using Alzheimer’s Training Data of Tutorial F Tutorial H Data Prep 1-1: Merging Data Sources Tutorial I Data Prep 1–2: Data Description Tutorial J Data Prep 2-1: Data Cleaning and Recoding Tutorial K Data Prep 2-2: Dummy Coding Category Variables Tutorial L Data Prep 2-3: Outlier Handling Tutorial M Data Prep 3-1: Filling Missing Values With Constants Tutorial N Data Prep 3-2: Filling Missing Values With Formulas Tutorial O Data Prep 3-3: Filling Missing Values With a Model Tutorial P City of Chicago Crime Map: A Case Study Predicting Certain Kinds of Crime Using Statistica Data Miner and Text Miner Tutorial Q Using Customer Churn Data to Develop and Select a Best Predictive Model for Client Defection Using STATISTICA Data Miner 13 64-bit for Windows 10 Tutorial R Example With C&RT to Predict and Display Possible Structural Relationships Tutorial S Clinical Psychology: Making Decisions About Best Therapy for a Client Part 4: Model Ensembles, Model Complexity; Using the Right Model for the Right Use, Significance, Ethics, and the Future, and Advanced Processes 16. The Apparent Paradox of Complexity in Ensemble Modeling 17. The "Right Model" for the "Right Purpose": When Less Is Good Enough 18. A Data Preparation Cookbook 19. Deep Learning 20. Significance versus Luck in the Age of Mining: The Issues of P-Value "Significance" and "Ways to Test Significance of Our Predictive Analytic Models" 21. Ethics and Data Analytics 22. IBM Watson

    15 in stock

    £75.04

  • Data Mining: Concepts and Techniques

    Elsevier Science & Technology Data Mining: Concepts and Techniques

    15 in stock

    Book SynopsisTable of Contents1. Introduction 2. Data, measurements, and data processing 3. Data warehousing and online analytical processing 4. Pattern mining: basic concepts and methods 5. Pattern mining: advanced methods 6. Classification: basic concepts and methods 7. Classification: advanced methods 8. Cluster analysis: basic concepts and methods 9. Cluster analysis: advanced methods 10. Deep learning 11. Outlier Detection 12. Data mining trends and research frontiers Appendix: Mathematical background

    15 in stock

    £62.06

  • Analyzing Social Media Networks with NodeXL

    Elsevier Science & Technology Analyzing Social Media Networks with NodeXL

    1 in stock

    Book SynopsisTable of ContentsPart I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social Media: New Technologies of Collaboration 3. Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections Part II. NodeXL Tutorial: Learning by Doing 4. Installation, Orientation, and Layout 5. Labeling and Visual Attributes 6. Calculating and Visualizing Network Metrics 7. Grouping and Filtering 8. Semantic Networks Part III. Social Media Network Analysis Case Studies 9. Email: The Lifeblood of Modern Communication 10. Thread Networks: Mapping Message Boards and Email Lists 11. Twitter: Information Flows, Influencers, and Organic Communities 12. Facebook: Public Pages and Inter-Organizational Networks 13. YouTube: Exploring Video Networks 14. Wiki Networks: Connections of Culture and Collaboration

    1 in stock

    £35.06

  • Getting Started with Data Science

    Pearson Education Getting Started with Data Science

    1 in stock

    Book SynopsisMurtaza Haider, Ph.D., is an Associate Professor at the Ted Rogers School of Management, Ryerson University, and the Director of a consulting firm Regionomics Inc. He is also a visiting research fellow at the Munk School of Global Affairs at the University of Toronto (2014-15). In addition, he is a senior research affiliate with the Canadian Network for Research on Terrorism, Security, and Society, and an adjunct professor of engineering at McGill University. Haider specializes in applying analytics and statistical methods to find solutions for socioeconomic challenges. His research interests include analytics; data science; housing market dynamics; infrastructure, transportation, and urban planning; and human development in North America and South Asia. He is an avid blogger/data journalist and writes weekly for the Dawn newspaper and occasionally for the Huffington Post. Haider holds a Masters in transport engineering and planning and a Ph.D. inTable of Contents Chapter 1 The Bazaar of Storytellers Chapter 2 Data in the 24/7 Connected World Chapter 3 The Deliverable Chapter 4 Serving Tables Chapter 5 Graphic Details Chapter 6 Hypothetically Speaking Chapter 7 Why Tall Parents Don’t Have Even Taller Children Chapter 8 To Be or Not to Be Chapter 9 Categorically Speaking About Categorical Data Chapter 10 Spatial Data Analytics Chapter 11 Doing Serious Time with Time Series Chapter 12 Data Mining for Gold Index

    1 in stock

    £23.99

  • Data Visualization

    Cengage Learning, Inc Data Visualization

    1 in stock

    Book SynopsisDATA VISUALIZATION: Exploring and Explaining with Data is designed to introduce best practices in data visualization to undergraduate and graduate students. This is one of the first books on data visualization designed for college courses. The book contains material on effective design, choice of chart type, effective use of color, how to both explore data visually, and how to explain concepts and results visually in a compelling way with data. The book explains both the "why" of data visualization and the "how." That is, the book provides lucid explanations of the guiding principles of data visualization through the use of interesting examples.Table of Contents1. Introduction. 2. Selecting a Chart Type. 3. Data Visualization and Design. 4. Purposeful Use of Color. 5. Visualizing Variability. 6. Exploring Data Visually. 7. Explaining Visually to Influence with Data. 8. Data Dashboards. 9. Telling the Truth with Data Visualization.

    1 in stock

    £58.89

  • Machine Learning in Cyber Trust

    Springer-Verlag New York Inc. Machine Learning in Cyber Trust

    1 in stock

    Book SynopsisCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.Trade ReviewFrom the reviews: "This is a useful book on machine learning for cyber security applications. It will be helpful to researchers and graduate students who are looking for an introduction to a specific topic in the field. All of the topics covered are well researched. The book consists of 12 chapters, grouped into four parts." (Imad H. Elhajj, ACM Computing Reviews, October, 2009)Table of ContentsCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.

    1 in stock

    £125.99

  • Unsupervised Learning

    John Wiley & Sons Inc Unsupervised Learning

    15 in stock

    Book SynopsisA new approach to unsupervised learning Evolving technologies have brought about an explosion of information in recent years, but the question of how such information might be effectively harvested, archived, and analyzed remains a monumental challengefor the processing of such information is often fraught with the need for conceptual interpretation: a relatively simple task for humans, yet an arduous one for computers. Inspired by the relative success of existing popular research on self-organizing neural networks for data clustering and feature extraction, Unsupervised Learning: A Dynamic Approach presents information within the family of generative, self-organizing maps, such as the self-organizing tree map (SOTM) and the more advanced self-organizing hierarchical variance map (SOHVM). It covers a series of pertinent, real-world applications with regard to the processing of multimedia datafrom its role in generic image processing techniques, such as thTable of ContentsAcknowledgments xi 1 Introduction 1 1.1 Part I: The Self-Organizing Method 1 1.2 Part II: Dynamic Self-Organization for Image Filtering and Multimedia Retrieval 2 1.3 Part III: Dynamic Self-Organization for Image Segmentation and Visualization 5 1.4 Future Directions 7 2 Unsupervised Learning 9 2.1 Introduction 9 2.2 Unsupervised Clustering 9 2.3 Distance Metrics for Unsupervised Clustering 11 2.4 Unsupervised Learning Approaches 13 2.4.1 Partitioning and Cluster Membership 13 2.4.2 Iterative Mean-Squared Error Approaches 15 2.4.3 Mixture Decomposition Approaches 17 2.4.4 Agglomerative Hierarchical Approaches 18 2.4.5 Graph-Theoretic Approaches 20 2.4.6 Evolutionary Approaches 20 2.4.7 Neural Network Approaches 21 2.5 Assessing Cluster Quality and Validity 21 2.5.1 Cost Function–Based Cluster Validity Indices 22 2.5.2 Density-Based Cluster Validity Indices 23 2.5.3 Geometric-Based Cluster Validity Indices 24 3 Self-Organization 27 3.1 Introduction 27 3.2 Principles of Self-Organization 27 3.2.1 Synaptic Self-Amplification and Competition 27 3.2.2 Cooperation 28 3.2.3 Knowledge Through Redundancy 29 3.3 Fundamental Architectures 29 3.3.1 Adaptive Resonance Theory 29 3.3.2 Self-Organizing Map 37 3.4 Other Fixed Architectures for Self-Organization 43 3.4.1 Neural Gas 44 3.4.2 Hierarchical Feature Map 45 3.5 Emerging Architectures for Self-Organization 46 3.5.1 Dynamic Hierarchical Architectures 47 3.5.2 Nonstationary Architectures 48 3.5.3 Hybrid Architectures 50 3.6 Conclusion 50 4 Self-Organizing Tree Map 53 4.1 Introduction 53 4.2 Architecture 54 4.3 Competitive Learning 55 4.4 Algorithm 57 4.5 Evolution 61 4.5.1 Dynamic Topology 61 4.5.2 Classification Capability 64 4.6 Practical Considerations, Extensions, and Refinements 68 4.6.1 The Hierarchical Control Function 68 4.6.2 Learning, Timing, and Convergence 71 4.6.3 Feature Normalization 73 4.6.4 Stop Criteria 73 4.7 Conclusions 74 5 Self-Organization in Impulse Noise Removal 75 5.1 Introduction 75 5.2 Review of Traditional Median-Type Filters 76 5.3 The Noise-Exclusive Adaptive Filtering 82 5.3.1 Feature Selection and Impulse Detection 82 5.3.2 Noise Removal Filters 84 5.4 Experimental Results 86 5.5 Detection-Guided Restoration and Real-Time Processing 99 5.5.1 Introduction 99 5.5.2 Iterative Filtering 101 5.5.3 Recursive Filtering 104 5.5.4 Real-Time Processing of Impulse Corrupted TV Pictures 105 5.5.5 Analysis of the Processing Time 109 5.6 Conclusions 115 6 Self-Organization in Image Retrieval 119 6.1 Retrieval of Visual Information 120 6.2 Visual Feature Descriptor 122 6.2.1 Color Histogram and Color Moment Descriptors 122 6.2.2 Wavelet Moment and Gabor Texture Descriptors 123 6.2.3 Fourier and Moment-based Shape Descriptors 125 6.2.4 Feature Normalization and Selection 127 6.3 User-Assisted Retrieval 130 6.3.1 Radial Basis Function Method 132 6.4 Self-Organization for Pseudo Relevance Feedback 136 6.5 Directed Self-Organization 140 6.5.1 Algorithm 142 6.6 Optimizing Self-Organization for Retrieval 146 6.6.1 Genetic Principles 147 6.6.2 System Architecture 149 6.6.3 Genetic Algorithm for Feature Weight Detection 150 6.7 Retrieval Performance 153 6.7.1 Directed Self-Organization 153 6.7.2 Genetic Algorithm Weight Detection 155 6.8 Summary 157 7 The Self-Organizing Hierarchical Variance Map 159 7.1 An Intuitive Basis 160 7.2 Model Formulation and Breakdown 162 7.2.1 Topology Extraction via Competitive Hebbian Learning 163 7.2.2 Local Variance via Hebbian Maximal Eigenfilters 165 7.2.3 Global and Local Variance Interplay for Map Growth and Termination 170 7.3 Algorithm 173 7.3.1 Initialization, Continuation, and Presentation 173 7.3.2 Updating Network Parameters 175 7.3.3 Vigilance Evaluation and Map Growth 175 7.3.4 Topology Adaptation 176 7.3.5 Node Adaptation 177 7.3.6 Optional Tuning Stage 177 7.4 Simulations and Evaluation 177 7.4.1 Observations of Evolution and Partitioning 178 7.4.2 Visual Comparisons with Popular Mean-Squared Error Architectures 181 7.4.3 Visual Comparison Against Growing Neural Gas 183 7.4.4 Comparing Hierarchical with Tree-Based Methods 183 7.5 Tests on Self-Determination and the Optional Tuning Stage 187 7.6 Cluster Validity Analysis on Synthetic and UCI Data 187 7.6.1 Performance vs. Popular Clustering Methods 190 7.6.2 IRIS Dataset 192 7.6.3 WINE Dataset 195 7.7 Summary 195 8 Microbiological Image Analysis Using Self-Organization 197 8.1 Image Analysis in the Biosciences 197 8.1.1 Segmentation: The Common Denominator 198 8.1.2 Semi-supervised versus Unsupervised Analysis 199 8.1.3 Confocal Microscopy and Its Modalities 200 8.2 Image Analysis Tasks Considered 202 8.2.1 Visualising Chromosomes During Mitosis 202 8.2.2 Segmenting Heterogeneous Biofilms 204 8.3 Microbiological Image Segmentation 205 8.3.1 Effects of Feature Space Definition 207 8.3.2 Fixed Weighting of Feature Space 209 8.3.3 Dynamic Feature Fusion During Learning 213 8.4 Image Segmentation Using Hierarchical Self-Organization 215 8.4.1 Gray-Level Segmentation of Chromosomes 215 8.4.2 Automated Multilevel Thresholding of Biofilm 220 8.4.3 Multidimensional Feature Segmentation 221 8.5 Harvesting Topologies to Facilitate Visualization 226 8.5.1 Topology Aware Opacity and Gray-Level Assignment 227 8.5.2 Visualization of Chromosomes During Mitosis 228 8.6 Summary 233 9 Closing Remarks and Future Directions 237 9.1 Summary of Main Findings 237 9.1.1 Dynamic Self-Organization: Effective Models for Efficient Feature Space Parsing 237 9.1.2 Improved Stability, Integrity, and Efficiency 238 9.1.3 Adaptive Topologies Promote Consistency and Uncover Relationships 239 9.1.4 Online Selection of Class Number 239 9.1.5 Topologies Represent a Useful Backbone for Visualization or Analysis 240 9.2 Future Directions 240 9.2.1 Dynamic Navigation for Information Repositories 241 9.2.2 Interactive Knowledge-Assisted Visualization 243 9.2.3 Temporal Data Analysis Using Trajectories 245 Appendix A 249 A.1 Global and Local Consistency Error 249 References 251 Index 269

    15 in stock

    £100.76

  • Making Sense of Data III

    John Wiley & Sons Inc Making Sense of Data III

    15 in stock

    Book SynopsisAs third in the series, this book focuses on a style of data analysis that makes graphics central to exploration. Making Sense of Data III explains how to implement decision support systems and provides an interactive approach to data analysis that allows users to see, manipulate, explore, mine data, and share results with colleagues.Trade Review“It is an essential book for understanding the principal role that graphics play in data visualization.” (Zentralblatt MATH, 1 April 2015) Table of ContentsPreface. 1. Introduction. 1.1 Overview. 1.2 Visual Perception. 1.3 Visualization. 1.4 Designing for High-throughput Data Exploration. 1.5 Summary. 1.6 Further reading. 2. The Cognitive and Visual Systems. 2.1 External Representation. 2.2 The Cognitive System. 2.3 Visual Perception. 2.4 Influencing Visual Perception. 2.5 Summary. 2.6 Further reading. 3. Graphic Representations. 3.1 Jacques Bertin: Semiology of Graphics. 3.2 Wilkinson: Grammar of Graphics. 3.3 Wickham: ggplot2. 3.4 Bostock and Heer: Protovis. 3.5 Summary. 3.6 Further reading. 4. Designing Visual Interactions. 4.1 Designing for Complexity. 4.2 The Process of Design. 4.3 Visual Interaction Design. 5. Hands-on: Creating Interactive Visualizations with Protovis. 5.1 Using Protovis. 5.2 Creating Code using the Protovis Graphical Framework. 5.3 Basic Protovis Marks. 5.4 Creating Customized Plots. 5.5 Creating Basic Plots. 5.6 Data Analysis Graphs. 5.7 Composite Plots. 5.8 Interactive Plots. 5.9 Protovis Summary. 5.10 Further Reading. Appendix. A Exercise Code Examples. Bibliography. Index.

    15 in stock

    £81.86

  • Data Mining Techniques

    John Wiley & Sons Inc Data Mining Techniques

    15 in stock

    Book SynopsisThe leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business.Table of ContentsIntroduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 What Is Data Mining? 2 Data Mining Is a Business Process 2 Large Amounts of Data 3 Meaningful Patterns and Rules 3 Data Mining and Customer Relationship Management 4 Why Now? 6 Data Is Being Produced 6 Data Is Being Warehoused 6 Computing Power Is Affordable 7 Interest in Customer Relationship Management Is Strong 7 Commercial Data Mining Software Products Have Become Available 8 Skills for the Data Miner 9 The Virtuous Cycle of Data Mining 9 A Case Study in Business Data Mining 11 Identifying BofA’s Business Challenge 12 Applying Data Mining 12 Acting on the Results 13 Measuring the Effects of Data Mining 14 Steps of the Virtuous Cycle 15 Identify Business Opportunities 16 Transform Data into Information 17 Act on the Information 19 Measure the Results 20 Data Mining in the Context of the Virtuous Cycle 23 Lessons Learned 26 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Two Customer Lifecycles 27 The Customer’s Lifecycle 28 The Customer Lifecycle 28 Subscription Relationships versus Event-Based Relationships 30 Organize Business Processes Around the Customer Lifecycle 32 Customer Acquisition 33 Customer Activation 36 Customer Relationship Management 37 Winback 38 Data Mining Applications for Customer Acquisition 38 Identifying Good Prospects 39 Choosing a Communication Channel 39 Picking Appropriate Messages 40 A Data Mining Example: Choosing the Right Place to Advertise 40 Who Fits the Profile? 41 Measuring Fitness for Groups of Readers 44 Data Mining to Improve Direct Marketing Campaigns 45 Response Modeling 46 Optimizing Response for a Fixed Budget 47 Optimizing Campaign Profitability 49 Reaching the People Most Influenced by the Message 53 Using Current Customers to Learn About Prospects 54 Start Tracking Customers Before They Become “Customers” 55 Gather Information from New Customers 55 Acquisition-Time Variables Can Predict Future Outcomes 56 Data Mining Applications for Customer Relationship Management 56 Matching Campaigns to Customers 56 Reducing Exposure to Credit Risk 58 Determining Customer Value 59 Cross-selling, Up-selling, and Making Recommendations 60 Retention 60 Recognizing Attrition 60 Why Attrition Matters 61 Different Kinds of Attrition 62 Different Kinds of Attrition Model 63 Beyond the Customer Lifecycle 64 Lessons Learned 65 Chapter 3 The Data Mining Process 67 What Can Go Wrong? 68 Learning Things That Aren’t True 68 Learning Things That Are True, but Not Useful 73 Data Mining Styles 74 Hypothesis Testing 75 Directed Data Mining 81 Undirected Data Mining 81 Goals, Tasks, and Techniques 82 Data Mining Business Goals 82 Data Mining Tasks 83 Data Mining Techniques 88 Formulating Data Mining Problems: From Goals to Tasks to Techniques 88 What Techniques for Which Tasks? 95 Is There a Target or Targets? 96 What Is the Target Data Like? 96 What Is the Input Data Like? 96 How Important Is Ease of Use? 97 How Important Is Model Explicability? 97 Lessons Learned 98 Chapter 4 Statistics 101: What You Should Know About Data 101 Occam’s Razor 103 Skepticism and Simpson’s Paradox 103 The Null Hypothesis 104 P-Values 105 Looking At and Measuring Data 106 Categorical Values 106 Numeric Variables 117 A Couple More Statistical Ideas 120 Measuring Response 120 Standard Error of a Proportion 121 Comparing Results Using Confidence Bounds 123 Comparing Results Using Difference of Proportions 124 Size of Sample 125 What the Confidence Interval Really Means 126 Size of Test and Control for an Experiment 127 Multiple Comparisons 129 The Confidence Level with Multiple Comparisons 129 Bonferroni’s Correction 129 Chi-Square Test 130 Expected Values 130 Chi-Square Value 132 Comparison of Chi-Square to Difference of Proportions 134 An Example: Chi-Square for Regions and Starts 134 Case Study: Comparing Two Recommendation Systems with an A/B Test 138 First Metric: Participating Sessions 140 Data Mining and Statistics 144 Lessons Learned 148 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Directed Data Mining Models 152 Defining the Model Structure and Target 152 Incremental Response Modeling 154 Model Stability 156 Time-Frames in the Model Set 157 Directed Data Mining Methodology 159 Step 1: Translate the Business Problem into a Data Mining Problem 161 How Will Results Be Used? 163 How Will Results Be Delivered? 163 The Role of Domain Experts and Information Technology 164 Step 2: Select Appropriate Data 165 What Data Is Available? 166 How Much Data Is Enough? 167 How Much History Is Required? 167 How Many Variables? 168 What Must the Data Contain? 168 Step 3: Get to Know the Data 169 Examine Distributions 169 Compare Values with Descriptions 170 Validate Assumptions 170 Ask Lots of Questions 171 Step 4: Create a Model Set 172 Assembling Customer Signatures 172 Creating a Balanced Sample 172 Including Multiple Timeframes 174 Creating a Model Set for Prediction 174 Creating a Model Set for Profiling 176 Partitioning the Model Set 176 Step 5: Fix Problems with the Data 177 Categorical Variables with Too Many Values 177 Numeric Variables with Skewed Distributions and Outliers 178 Missing Values 178 Values with Meanings That Change over Time 179 Inconsistent Data Encoding 179 Step 6: Transform Data to Bring Information to the Surface 180 Step 7: Build Models 180 Step 8: Assess Models 180 Assessing Binary Response Models and Classifiers 181 Assessing Binary Response Models Using Lift 182 Assessing Binary Response Model Scores Using Lift Charts 184 Assessing Binary Response Model Scores Using Profitability Models 185 Assessing Binary Response Models Using ROC Charts 186 Assessing Estimators 188 Assessing Estimators Using Score Rankings 189 Step 9: Deploy Models 190 Practical Issues in Deploying Models 190 Optimizing Models for Deployment 191 Step 10: Assess Results 191 Step 11: Begin Again 193 Lessons Learned 193 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Similarity Models 196 Similarity and Distance 196 Example: A Similarity Model for Product Penetration 197 Table Lookup Models 203 Choosing Dimensions 204 Partitioning the Dimensions 205 From Training Data to Scores 205 Handling Sparse and Missing Data by Removing Dimensions 205 RFM: A Widely Used Lookup Model 206 RFM Cell Migration 207 RFM and the Test-and-Measure Methodology 208 RFM and Incremental Response Modeling 209 Naïve Bayesian Models 210 Some Ideas from Probability 210 The Naïve Bayesian Calculation 212 Comparison with Table Lookup Models 213 Linear Regression 213 The Best-fit Line 215 Goodness of Fit 217 Multiple Regression 220 The Equation 220 The Range of the Target Variable 221 Interpreting Coefficients of Linear Regression Equations 221 Capturing Local Effects with Linear Regression 223 Additional Considerations with Multiple Regression 224 Variable Selection for Multiple Regression 225 Logistic Regression 227 Modeling Binary Outcomes 227 The Logistic Function 229 Fixed Effects and Hierarchical Effects 231 Hierarchical Effects 232 Within and Between Effects 232 Fixed Effects 233 Lessons Learned 234 Chapter 7 Decision Trees 237 What Is a Decision Tree and How Is It Used? 238 A Typical Decision Tree 238 Using the Tree to Learn About Churn 240 Using the Tree to Learn About Data and Select Variables 241 Using the Tree to Produce Rankings 243 Using the Tree to Estimate Class Probabilities 243 Using the Tree to Classify Records 244 Using the Tree to Estimate Numeric Values 244 Decision Trees Are Local Models 245 Growing Decision Trees 247 Finding the Initial Split 248 Growing the Full Tree 251 Finding the Best Split 252 Gini (Population Diversity) as a Splitting Criterion 253 Entropy Reduction or Information Gain as a Splitting Criterion 254 Information Gain Ratio 256 Chi-Square Test as a Splitting Criterion 256 Incremental Response as a Splitting Criterion 258 Reduction in Variance as a Splitting Criterion for Numeric Targets 259 F Test 262 Pruning 262 The CART Pruning Algorithm 263 Pessimistic Pruning: The C5.0 Pruning Algorithm 267 Stability-Based Pruning 268 Extracting Rules from Trees 269 Decision Tree Variations 270 Multiway Splits 270 Splitting on More Than One Field at a Time 271 Creating Nonrectangular Boxes 271 Assessing the Quality of a Decision Tree 275 When Are Decision Trees Appropriate? 276 Case Study: Process Control in a Coffee Roasting Plant 277 Goals for the Simulator 277 Building a Roaster Simulation 278 Evaluation of the Roaster Simulation 278 Lessons Learned 279 Chapter 8 Artificial Neural Networks 281 A Bit of History 282 The Biological Model 283 The Biological Neuron 285 The Biological Input Layer 286 The Biological Output Layer 287 Neural Networks and Artificial Intelligence 287 Artificial Neural Networks 288 The Artificial Neuron 288 The Multi-Layer Perceptron 291 A Network Example 292 Network Topologies 293 A Sample Application: Real Estate Appraisal 295 Training Neural Networks 299 How Does a Neural Network Learn Using Back Propagation? 299 Pruning a Neural Network 300 Radial Basis Function Networks 303 Overview of RBF Networks 303 Choosing the Locations of the Radial Basis Functions 305 Universal Approximators 305 Neural Networks in Practice 308 Choosing the Training Set 309 Coverage of Values for All Features 309 Number of Features 310 Size of Training Set 310 Number and Range of Outputs 310 Rules of Thumb for Using MLPs 310 Preparing the Data 311 Interpreting the Output from a Neural Network 313 Neural Networks for Time Series 315 Time Series Modeling 315 A Neural Network Time Series Example 316 Can Neural Network Models Be Explained? 317 Sensitivity Analysis 318 Using Rules to Describe the Scores 318 Lessons Learned 319 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Memory-Based Reasoning 322 Look-Alike Models 323 Example: Using MBR to Estimate Rents in Tuxedo, New York 324 Challenges of MBR 327 Choosing a Balanced Set of Historical Records 328 Representing the Training Data 328 Determining the Distance Function, Combination Function, and Number of Neighbors 331 Case Study: Using MBR for Classifying Anomalies in Mammograms 331 The Business Problem: Identifying Abnormal Mammograms 332 Applying MBR to the Problem 332 The Total Solution 334 Measuring Distance and Similarity 335 What Is a Distance Function? 335 Building a Distance Function One Field at a Time 337 Distance Functions for Other Data Types 340 When a Distance Metric Already Exists 341 The Combination Function: Asking the Neighbors for Advice 342 The Simplest Approach: One Neighbor 342 The Basic Approach for Categorical Targets: Democracy 342 Weighted Voting for Categorical Targets 344 Numeric Targets 344 Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345 Why This Feat Is Challenging 346 The Audio Signature 347 Measuring Similarity 348 Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351 Building Profiles 352 Comparing Profiles 352 Making Predictions 353 Lessons Learned 354 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Customer Survival 360 What Survival Curves Reveal 360 Finding the Average Tenure from a Survival Curve 362 Customer Retention Using Survival 364 Looking at Survival as Decay 365 Hazard Probabilities 367 The Basic Idea 368 Examples of Hazard Functions 369 Censoring 371 The Hazard Calculation 372 Other Types of Censoring 375 From Hazards to Survival 376 Retention 376 Survival 378 Comparison of Retention and Survival 378 Proportional Hazards 380 Examples of Proportional Hazards 381 Stratification: Measuring Initial Effects on Survival 382 Cox Proportional Hazards 382 Survival Analysis in Practice 385 Handling Different Types of Attrition 385 When Will a Customer Come Back? 387 Understanding Customer Value 389 Forecasting 392 Hazards Changing over Time 393 Lessons Learned 394 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Optimization 398 What Is an Optimization Problem? 398 An Optimization Problem in Ant World 399 E Pluribus Unum 400 A Smarter Ant 401 Genetic Algorithms 403 A Bit of History 404 Genetics on Computers 404 Representing the Genome 413 Schemata: The Building Blocks of Genetic Algorithms 414 Beyond the Simple Algorithm 417 The Traveling Salesman Problem 418 Exhaustive Search 419 A Simple Greedy Algorithm 419 The Genetic Algorithms Approach 419 The Swarm Intelligence Approach 420 Case Study: Using Genetic Algorithms for Resource Optimization 421 Case Study: Evolving a Solution for Classifying Complaints 423 Business Context 424 Data 425 The Comment Signature 425 The Genomes 426 The Fitness Function 427 The Results 427 Lessons Learned 427 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Undirected Techniques, Undirected Data Mining 431 Undirected versus Directed Techniques 431 Undirected versus Directed Data Mining 431 Case Study: Undirected Data Mining Using Directed Techniques 432 What is Undirected Data Mining? 435 Data Exploration 435 Segmentation and Clustering 436 Target Variable Definition, When the Target Is Not Explicit 438 Simulation, Forecasting, and Agent-Based Modeling 443 Methodology for Undirected Data Mining 455 There Is No Methodology 456 Things to Keep in Mind 456 Lessons Learned 457 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Searching for Islands of Simplicity 461 Customer Segmentation and Clustering 461 Similarity Clusters 463 Tracking Campaigns by Cluster-Based Segments 464 Clustering Reveals an Overlooked Market Segment 466 Fitting the Troops 467 The K-Means Clustering Algorithm 468 Two Steps of the K-Means Algorithm 468 Voronoi Diagrams and K-Means Clusters 471 Choosing the Cluster Seeds 473Choosing K 473 Using K-Means to Detect Outliers 474 Semi-Directed Clustering 475 Interpreting Clusters 475 Characterizing Clusters by Their Centroids 476 Characterizing Clusters by What Differentiates Them 477 Using Decision Trees to Describe Clusters 478 Evaluating Clusters 479 Cluster Measurements and Terminology 480 Cluster Silhouettes 480 Limiting Cluster Diameter for Scoring 483 Case Study: Clustering Towns 484 Creating Town Signatures 484 Creating Clusters 486 Determining the Right Number of Clusters 486 Evaluating the Clusters 487 Using Demographic Clusters to Adjust Zone Boundaries 488 Business Success 490 Variations on K-Means 490 K-Medians, K-Medoids, and K-Modes 490 The Soft Side of K-Means 494 Data Preparation for Clustering 495 Scaling for Consistency 496 Use Weights to Encode Outside Information 496 Selecting Variables for Clustering 497 Lessons Learned 497 Chapter 14 Alternative Approaches to Cluster Detection 499 Shortcomings of K-Means 500 Reasonableness 500 An Intuitive Example 501 Fixing the Problem by Changing the Scales 503 What This Means in Practice 504 Gaussian Mixture Models 505 Adding “Gaussians” to K-Means 505 Back to Gaussian Mixture Models 508 Scoring GMMs 510 Applying GMMs 511 Divisive Clustering 513 A Decision Tree–Like Method for Clustering 513 Scoring Divisive Clusters 515 Clusters and Trees 515 Agglomerative (Hierarchical) Clustering 516 Overview of Agglomerative Clustering Methods 516 Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520 Scoring Agglomerative Clusters 522 Limitations of Agglomerative Clustering 523 Agglomerative Clustering in Practice 525 Combining Agglomerative Clustering and K-Means 526 Self-Organizing Maps 527 What Is a Self-Organizing Map? 527 Training an SOM 530 Scoring an SOM 531 The Search Continues for Islands of Simplicity 532 Lessons Learned 533 Chapter 15 Market Basket Analysis and Association Rules 535 Defining Market Basket Analysis 536 Four Levels of Market Basket Data 537 The Foundation of Market Basket Analysis: Basic Measures 539 Order Characteristics 540 Item (Product) Popularity 541 Tracking Marketing Interventions 542 Case Study: Spanish or English 543 The Business Problem 543 The Data 544 Defining “Hispanicity” Preference 545 The Solution 546 Association Analysis 547 Rules Are Not Always Useful 548 Item Sets to Association Rules 551 How Good Is an Association Rule? 553 Building Association Rules 555 Choosing the Right Set of Items 556 Anonymous Versus Identified 561 Generating Rules from All This Data 561 Overcoming Practical Limits 565 The Problem of Big Data 567 Extending the Ideas 569 Different Items on the Right- and Left-Hand Sides 569 Using Association Rules to Compare Stores 570 Association Rules and Cross-Selling 572 A Typical Cross-Sell Model 572 A More Confident Approach to Product Propensities 573 Results from Using Confidence 574 Sequential Pattern Analysis 574 Finding the Sequences 575 Sequential Association Rules 578 Sequential Analysis Using Other Data Mining Techniques 579 Lessons Learned 579 Chapter 16 Link Analysis 581 Basic Graph Theory 582 What Is a Graph? 582 Directed Graphs 584 Weighted Graphs 585 Seven Bridges of Königsberg 585 Detecting Cycles in a Graph 588 The Traveling Salesman Problem Revisited 589 Social Network Analysis 593 Six Degrees of Separation 593 What Your Friends Say About You 595 Finding Childcare Benefits Fraud 596 Who Responds to Whom on Dating Sites 597 Social Marketing 598 Mining Call Graphs 598 Case Study: Tracking Down the Leader of the Pack 601 The Business Goal 601 The Data Processing Challenge 601 Finding Social Networks in Call Data 602 How the Results Are Used for Marketing 602 Estimating Customer Age 603 Case Study: Who Is Using Fax Machines from Home? 604 Why Finding Fax Machines Is Useful 604 How Do Fax Machines Behave? 604 A Graph Coloring Algorithm 605 “Coloring” the Graph to Identify Fax Machines 606 How Google Came to Rule the World 607 Hubs and Authorities 608 The Details 609 Hubs and Authorities in Practice 611 Lessons Learned 612 Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613 The Architecture of Data 615 Transaction Data, the Base Level 616 Operational Summary Data 617 Decision-Support Summary Data 617 Database Schema/Data Models 618 Metadata 623 Business Rules 623 A General Architecture for Data Warehousing 624 Source Systems 624 Extraction, Transformation, and Load 626 Central Repository 627 Metadata Repository 630 Data Marts 630 Operational Feedback 631 Users and Desktop Tools 631 Analytic Sandboxes 633 Why Are Analytic Sandboxes Needed? 634 Technology to Support Analytic Sandboxes 636 Where Does OLAP Fit In? 639 What’s in a Cube? 641 Star Schema 646 OLAP and Data Mining 648 Where Data Mining Fits in with Data Warehousing 650 Lots of Data 651 Consistent, Clean Data 651 Hypothesis Testing and Measurement 652 Scalable Hardware and RDBMS Support 653 Lessons Learned 653 Chapter 18 Building Customer Signatures 655 Finding Customers in Data 656 What Is a Customer? 657 Accounts? Customers? Households? 658 Anonymous Transactions 658 Transactions Linked to a Card 659 Transactions Linked to a Cookie 659 Transactions Linked to an Account 660 Transactions Linked to a Customer 661 Designing Signatures 661 Is a Customer Signature Necessary? 666 What Does a Row Represent? 666 Will the Signature Be Used for Predictive Modeling? 671 Has a Target Been Defined? 672 Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672 Which Customers Will Be Included? 673 What Might Be Interesting to Know About Customers? 673 What a Signature Looks Like 674 Process for Creating Signatures 677 Some Data Is Already at the Right Level of Granularity 678 Pivoting a Regular Time Series 679 Aggregating Time-Stamped Transactions 680 Dealing with Missing Values 685 Missing Values in Source Data 685 Unknown or Non-Existent? 687 What Not to Do 687 Things to Consider 689 Lessons Learned 691 Chapter 19 Derived Variables: Making the Data Mean More 693 Handset Churn Rate as a Predictor of Churn 694 Single-Variable Transformations 696 Standardizing Numeric Variables 696 Turning Numeric Values into Percentiles 697 Turning Counts into Rates 698 Relative Measures 699 Replacing Categorical Variables with Numeric Ones 700 Combining Variables 707 Classic Combinations 707 Combining Highly Correlated Variables 710 Rent to Home Value 712 Extracting Features from Time Series 718 Trend 719 Seasonality 721 Extracting Features from Geography 722 Geocoding 722 Mapping 723 Using Geography to Create Relative Measures 724 Using Past Values of the Target Variable 725 Using Model Scores as Inputs 725 Handling Sparse Data 726 Account Set Patterns 726 Binning Sparse Values 727 Capturing Customer Behavior from Transactions 727 Widening Narrow Data 728 Sphere of Influence as a Predictor of Good Customers 728 An Example: Ratings to Rater Profile 730 Sample Fields from the Rater Signature 730 The Rating Signature and Derived Variables 732 Lessons Learned 733 Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735 Problems with Too Many Variables 736 Risk of Correlation Among Input Variables 736 Risk of Overfitting 738 The Sparse Data Problem 738 Visualizing Sparseness 739 Independence 740 Exhaustive Feature Selection 743 Flavors of Variable Reduction Techniques 744 Using the Target 744 Original versus New Variables 744 Sequential Selection of Features 745 The Traditional Forward Selection Methodology 745 Forward Selection Using a Validation Set 747 Stepwise Selection 748 Forward Selection Using Non-Regression Techniques 748 Backward Selection 748 Undirected Forward Selection 749 Other Directed Variable Selection Methods 749 Using Decision Trees to Select Variables 750 Variable Reduction Using Neural Networks 752 Principal Components 753 What Are Principal Components? 753 Principal Components Example 758 Principal Component Analysis 763 Factor Analysis 767 Variable Clustering 768 Example of Variable Clusters 768 Using Variable Clusters 770 Hierarchical Variable Clustering 770 Divisive Variable Clustering 773 Lessons Learned 774 Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775 What Is Text Mining? 776 Text Mining for Derived Columns 776 Beyond Derived Features 777 Text Analysis Applications 778 Working with Text Data 781 Sources of Text 781 Language Effects 782 Basic Approaches to Representing Documents 783 Representing Documents in Practice 784 Documents and the Corpus 786 Case Study: Ad Hoc Text Mining 786 The Boycott 787 Business as Usual 787 Combining Text Mining and Hypothesis Testing 787 The Results 788 Classifying News Stories Using MBR 789 What Are the Codes? 789 Applying MBR 790 The Results 793 From Text to Numbers 794 Starting with a “Bag of Words” 794 Term-Document Matrix 796 Corpus Effects 797 Singular Value Decomposition (SVD) 798 Text Mining and Naïve Bayesian Models 800 Naïve Bayesian in the Text World 801 Identifying Spam Using Naïve Bayesian 801 Sentiment Analysis 806 DIRECTV: A Case Study in Customer Service 809 Background 809 Applying Text Mining 811 Taking the Technical Approach 814 Not an Iterative Process 818 Continuing to Benefit 818 Lessons Learned 819 Index 821

    15 in stock

    £37.05

  • Graphical Models

    John Wiley & Sons Inc Graphical Models

    1 in stock

    Book SynopsisGraphical models are of increasing importance in applied statistics, and in particular in data mining. Providing a self-contained introduction and overview to learning relational, probabilistic, and possibilistic networks from data, this second edition of Graphical Models is thoroughly updated to include the latest research in this burgeoning field, including a new chapter on visualization. The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.Trade Review“The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.” (Zentralblatt Math, 1 August 2013) "All of the necessary background is provided, with material on modeling under uncertainty and imprecision modeling, decomposition of distributions, graphical representation of distributions, applications relating to graphical models, and problems for further research." (Book News, December 2009)Table of ContentsPreface. 1 Introduction. 1.1 Data and Knowledge. 1.2 Knowledge Discovery and Data Mining. 1.3 Graphical Models. 1.4 Outline of this Book. 2 Imprecision and Uncertainty. 2.1 Modeling Inferences. 2.2 Imprecision and Relational Algebra. 2.3 Uncertainty and Probability Theory. 2.4 Possibility Theory and the Context Model. 3 Decomposition. 3.1 Decomposition and Reasoning. 3.2 Relational Decomposition. 3.3 Probabilistic Decomposition. 3.4 Possibilistic Decomposition. 3.5 Possibility versus Probability. 4 Graphical Representation. 4.1 Conditional Independence Graphs. 4.2 Evidence Propagation in Graphs. 5 Computing Projections. 5.1 Databases of Sample Cases. 5.2 Relational and Sum Projections. 5.3 Expectation Maximization. 5.4 Maximum Projections. 6 Naive Classifiers. 6.1 Naive Bayes Classifiers. 6.2 A Naive Possibilistic Classifier. 6.3 Classifier Simplification. 6.4 Experimental Evaluation. 7 Learning Global Structure. 7.1 Principles of Learning Global Structure. 7.2 Evaluation Measures. 7.3 Search Methods. 7.4 Experimental Evaluation. 8 Learning Local Structure. 8.1 Local Network Structure. 8.2 Learning Local Structure. 8.3 Experimental Evaluation. 9 Inductive Causation. 9.1 Correlation and Causation. 9.2 Causal and Probabilistic Structure. 9.3 Faithfulness and Latent Variables. 9.4 The Inductive Causation Algorithm. 9.5 Critique of the Underlying Assumptions. 9.6 Evaluation. 10 Visualization. 10.1 Potentials. 10.2 Association Rules. 11 Applications. 11.1 Diagnosis of Electrical Circuits. 11.2 Application in Telecommunications. 11.3 Application at Volkswagen. 11.4 Application at DaimlerChrysler. A Proofs of Theorems. A.1 Proof of Theorem 4.1.2. A.2 Proof of Theorem 4.1.18. A.3 Proof of Theorem 4.1.20. A.4 Proof of Theorem 4.1.26. A.5 Proof of Theorem 4.1.28. A.6 Proof of Theorem 4.1.30. A.7 Proof of Theorem 4.1.31. A.8 Proof of Theorem 5.4.8. A.9 Proof of Lemma .2.2. A.10 Proof of Lemma .2.4. A.11 Proof of Lemma .2.6. A.12 Proof of Theorem 7.3.1. A.13 Proof of Theorem 7.3.2. A.14 Proof of Theorem 7.3.3. A.15 Proof of Theorem 7.3.5. A.16 Proof of Theorem 7.3.7. B Software Tools. Bibliography. Index.

    1 in stock

    £88.16

  • Data Mining Techniques in CRM Inside Customer

    John Wiley & Sons Inc Data Mining Techniques in CRM Inside Customer

    15 in stock

    Book SynopsisThis is an applied handbook for the application of data mining techniques in the CRM framework. It combines a technical and a business perspective to cover the needs of business users who are looking for a practical guide on data mining.Trade Review"The book is written in a language that is easily accessible to business users who are not fluent in statistical methods and who have no prior exposure to the data mining or customer segmentation domain . . . This book is poised to become a standard reference, and I unconditionally recommend it to anyone working in this field." (Computing Reviews, 23 June 2011) "This is an excellent book for any data miner or anybody involved in CRM. The text is clear and pictures are well done and funny which is rare enough to be mentioned. From basic to advanced topics, the book is a very pleasant journey inside data mining with a clear focus on customer segmentation. Really advised if you're not a fan of formulas." (Data Mining Research, 18 March 2011)Table of ContentsAcknowledgements. 1. Data Mining in CRM. The CRM Strategy. What Can Data Mining Do? The Data Mining Methodology. Data Mining and Business Domain Expertise. Summary. 2. An Overview of Data Mining Techniques. Supervised Modeling. Unsupervised Modeling Techniques. Machine Learning/Artificial Intelligence vs. Statistical Techniques. Summary. 3. Data Mining Techniques for Segmentation. Segmenting Customers with Data Mining Techniques. Principal Components Analysis. Clustering Techniques. Examining and Evaluating the Cluster Solution. Understanding the Clusters through Profiling. Selecting the Optimal Cluster Solution. Cluster Profiling and Scoring with Supervised Models. An Introduction to Decision Tree Models. Summary. 4. The Mining Data Mart. Designing the Mining Data Mart. The Time Frame Covered by the Mining Data Mart. The Mining Data Mart for Retail Banking. The Mining Data Mart for Mobile Telephony Consumer (Residential) Customers. The Mining Data Mart for Retailers. Summary. 5. Customer Segmentation. An Introduction to Customer Segmentation. Segmentation Types in Consumer Markets. Segmentation in Business Markets. A Guide for Behavioral Segmentation. Segmentation Management Strategy. A Guide for Value-Based Segmentation. Designing Differentiated Strategies for the Value Segments. Summary. 6. Segmentation Applications in Banking. Segmentation for Credit Card Holders. Segmentation in Retail Banking. The Marketing Process. Segmentation in Retail Banking; A Summary. 7. Segmentation Applications in Telecommunications. Mobile Telephony. The Fixed Telephony Case. Summary. 8. Segmentation for Retailers. Segmentation in the Retail Industry. The RFM Analysis. Grouping Customers According to the Products They Buy. Summary. Further Reading. Index.

    15 in stock

    £61.16

  • Data Mining for the Social Sciences

    University of California Press Data Mining for the Social Sciences

    7 in stock

    Book SynopsisWe live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Providing an introduction to data mining, the authors discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists.Table of ContentsPART 1. CONCEPTS 1. What Is Data Mining? 2. Contrasts with the Conventional Statistical Approach 3. Some General Strategies Used in Data Mining 4. Important Stages in a Data Mining Project PART 2. WORKED EXAMPLES 5. Preparing Training and Test Datasets 6. Variable Selection Tools 7. Creating New Variables Using Binning and Trees 8. Extracting Variables 9. Classifiers 10. Classification Trees 11. Neural Networks 12. Clustering 13. Latent Class Analysis and Mixture Models 14. Association Rules Conclusion Bibliography Notes Index

    7 in stock

    £28.90

  • The Text Mining Handbook

    Cambridge University Press The Text Mining Handbook

    15 in stock

    Book SynopsisPresents a comprehensive discussion of the state-of-the-art in text mining and link detection. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, the book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches, ending with real-world, mission-critical applications.Trade Review' … buy the book. This book is definitely worth having in your book shelf as a handy reference.' IAPR NewsletterTable of Contents1. Introduction to text mining; 2. Core text mining operations; 3. Text mining preprocessing techniques; 4. Categorization; 5. Clustering; 6. Information extraction; 7. Probabilistic models for Information extraction; 8. Preprocessing applications using probabilistic and hybrid approaches; 9. Presentation-layer considerations for browsing and query refinement; 10. Visualization approaches; 11. Link analysis; 12. Text mining applications; Appendix; Bibliography.

    15 in stock

    £74.99

  • The Silicon Jungle

    Princeton University Press The Silicon Jungle

    1 in stock

    Book SynopsisWhat happens when a naive intern is granted unfettered access to people's most private thoughts and actions? Stephen Thorpe lands a coveted internship at Ubatoo, an Internet empire that provides its users with popular online services, from a search engine and e-mail, to social networking. When Stephen's boss asks him to work on a project with the ATrade ReviewCo-Winner of the 2012 Mary Shelley Award for Outstanding Fictional Work, Media Ecology Association "Baluja's clever, cynical debut explores the frightening possibilities of data mining... A nod to Upton Sinclair's muckraking The Jungle, which scared its readers into regulating the meat-packing industry, this lively if depressing novel suggests that computer snooping is too seductive to control, despite the consequences."--Publishers Weekly "[F]righteningly convincing... The read is quick, the questions will linger, and the ideas are so intriguing... Baluja simplifies the abstract world of tech-speak for the rest of us while aiming to do for the Internet what Upton Sinclair's The Jungle did for the meat industry: make readers reconsider its safety. For fans of intelligent thrillers."--Stephen Morrow, Library Journal "In the era of the ubiquitous web company, The Silicon Jungle provides ample food for thought."--Zena Iovino, New Scientist "[T]his cautionary tale is fascinating for its exploration of technology as a conduit for crime."--Michele Leber, Booklist Online "The book's central message is fascinating. A company like Google, Baluja points out, has far more information on U.S. citizens than does the FBI and far fewer restrictions on how to use it. It's a chilling message in a fun package."--Kathleen Offenholley, Mathematics TeacherTable of ContentsPreface xi Endings 1 Anklets 3 Anthropologists in the Midst 10 Mollycoddle 13 Touchpoints 19 Checking In 26 Working 9 to 4 28 Predicting the Future and 38 Needles 33 Contact 39 Two Geeks in a Pod 47 An Understatement 53 Euphoria and Diet Pills 61 To Better Days 70 Marathon 75 The Life and Soul of an Intern 81 Candid Cameras 85 Episodes 89 Liberal Food and Even More Liberal Activism 92 Subjects 100 Newsworthy 105 Patience 110 Hypergrowth 113 Little Pink Houses 117 Truth, Lies, and Algorithms 122 Negotiations and Herding Cats 129 The JENNY Discovery 133 I Dream of JENNY 138 A Five-Step Program: Hallucinations and Archetypes 143 Over-Deliver 150 A Life Changed in Four Phone Calls 154 Giving Thanks 160 A Drive through the Country 166 Control 171 A Tale of Two Tenures 178 Prelude to Pie 183 The Yuri Effect 188 Apple Pie 195 Thoughts Like Butterflies 201 Core-Relations 207 Collide 212 Control, Revisited 220 Fables of the Deconstruction 223 Control, Foregone 232 Foundations 236 One Way 241 Sebastin's Friends 244 A Tinker by Any Other Name 251 When It Rains 262 I Am a Heartbeat 267 What I Did This Summer 273 A Permanent Position 280 For Adam 284 Faith 288 Counting by Two 291 Disconnect 298 Sahim 304 Epilogue: Beginnings 309 Acknowledgments 313 Know More 315 Privacy Policy of a Few Organizations 317 References 319

    1 in stock

    £23.80

  • Data Visualization

    Princeton University Press Data Visualization

    1 in stock

    Book SynopsisTrade Review"[Healy’s] prose is engaging and chatty, and the style of instruction is unpretentious and practical . . . This single volume represents an excellent entry point for those wishing to upskill their abilities in data visualization."---Paul Cuffe, IEEE Transactions"Undoubtedly, this book is an excellent introduction to an essential tool for anyone who needs to collect and present data." * Conservation Biology *

    1 in stock

    £79.20

  • Data Analytics Applications in Gaming and

    Taylor & Francis Ltd Data Analytics Applications in Gaming and

    15 in stock

    Book SynopsisThe last decade has witnessed the rise of big data in game development as the increasing proliferation of Internet-enabled gaming devices has made it easier than ever before to collect large amounts of player-related data. At the same time, the emergence of new business models and the diversification of the player base have exposed a broader potential audience, which attaches great importance to being able to tailor game experiences to a wide range of preferences and skill levels. This, in turn, has led to a growing interest in data mining techniques, as they offer new opportunities for deriving actionable insights to inform game design, to ensure customer satisfaction, to maximize revenues, and to drive technical innovation. By now, data mining and analytics have become vital components of game development. The amount of work being done in this area nowadays makes this an ideal time to put together a book on this subject.Data Analytics Applications in Gaming andTable of ContentsPart 1 – Introduction to game data mining. Part 2 – Data mining for games user research. Part 3 – Data mining for game technology.Part 4 – Visualization of large-scale game data.

    15 in stock

    £42.74

  • The Art of Data Science

    CRC Press The Art of Data Science

    1 in stock

    Book SynopsisAlthough change is constant in business and analytics, some fundamental principles and lessons learned are truly timeless, extending and surviving beyond the rapid ongoing evolution of tools, techniques, and technologies. Through a series of articles published over the course of his 30+ year career in analytics and technology, author Doug Gray shares the most important lessons he has learned â with colleagues and students as well â that have helped to ensure success on his journey as a practitioner, leader, and educator.The reader witnesses the Analytical Sciences profession through the mindâs eye of a practitioner who has operated at the forefront of analytically-inclined organizations, such as American Airlines and Walmart, delivering solutions that generate hundreds of millions of dollars annually in business value, and an educator teaching students and conducting research at a leading university. Through real-world project case studies, first-hand stories, and practical e

    1 in stock

    £46.54

  • Ensemble Methods

    CRC Press Ensemble Methods

    1 in stock

    Book SynopsisEnsemble methods that train multiple learners and then combine them to use, with extit{Boosting} and extit{Bagging} as representatives, are well-known machine learning approaches. It has become common sense that an ensemble is usually significantly more accurate than a single learner, and ensemble methods have already achieved great success in various real-world tasks.Twelve years have passed since the publication of the first edition of the book in 2012 (Japanese and Chinese versions published in 2017 and 2020, respectively). Many significant advances in this field have been developed. First, many theoretical issues have been tackled, for example, the fundamental question of extit{why AdaBoost seems resistant to overfitting} gets addressed, so that now we understand much more about the essence of ensemble methods. Second, ensemble methods have been well developed in more machine learning fields, e.g., extit{isolation forest} in anomaly detection, so that now we have powe

    1 in stock

    £56.99

  • Recommender Systems Handbook

    Springer-Verlag New York Inc. Recommender Systems Handbook

    1 in stock

    Book SynopsisPreface.- Introduction.- Part 1: General Recommendation Techniques.- Trust Your Neighbors: A Comprehensive Survey of Neighborhood-based Methods for Recommender Systems (Desrosiers).- Advances in Collaborative Filtering (Koren).- Item Recommendation from Implicit Feedback (Rendle).- Deep Learning for Recommender Systems (Zhang).- Context Aware Re commender Sytems : From Foundatiom to Recent Developments (Bauman).- Semantics and Content-based Recommendations (Musto).- Part 2: Special Recommendation Techniques.- Session-based Recommender Systems (lannoch)..- Adversarial Recommender Systems: Attack,Defense, and Advances (Di Nola).- Group Recommender Systems: Beyond Preferance Aggregation (Masthoff).- People-to-People Reciprocal Recommenders (Koprinska).- Natural Language Processing for Recommender Systems (Sar-Shalom).- Design and Evaluation of Cross-domain Recommender Systems (Cremonesi).- Part 3: Value and Impact of Recommender Systems.- Value and Impact of Recommender SyTable of ContentsPreface.- Introduction.- Part 1: General Recommendation Techniques.- Trust Your Neighbors: A Comprehensive Survey of Neighborhood-based Methods for Recommender Systems (Desrosiers).- Advances in Collaborative Filtering (Koren).- Item Recommendation from Implicit Feedback (Rendle).- Deep Learning for Recommender Systems (Zhang).- Context Aware Re commender Sytems : From Foundatiom to Recent Developments (Bauman).- Semantics and Content-based Recommendations (Musto).- Part 2: Special Recommendation Techniques.- Session-based Recommender Systems (lannoch)..- Adversarial Recommender Systems: Attack,Defense, and Advances (Di Nola).- Group Recommender Systems: Beyond Preferance Aggregation (Masthoff).- People-to-People Reciprocal Recommenders (Koprinska).- Natural Language Processing for Recommender Systems (Sar-Shalom).- Design and Evaluation of Cross-domain Recommender Systems (Cremonesi).- Part 3: Value and Impact of Recommender Systems.- Value and Impact of Recommender Systems (Zanker).- Evaluating Recommender Systems (Shani).- Novelty and Diversity in Recommender Systems (Castells).- Multistakeholder Recommender Systems (Burke).- Fairness in Recommender Systems (Ekstrand).- Part 4: Human Computer Interaction.- Beyond Explaining Single Item Recommendations (Tintarev).- Personality and Recommender Systems (Tkalčič).- Individual and Group Decision Making and Recommender Systems (Jameson).- Part 5: Recommender Systems Applications .- Social Recommender Systems (Guy).- Food Recommender Systems (Trattner).- Music Recommendation Systems: Techniques, Use Cases, and Challenges (Schedl).- Multimedia Recommender Systems: Algorithms and Challenges (Deldjoo).- Fashion Recommender Systems (Dokoohaki).

    1 in stock

    £224.99

  • Mastering Python for Bioinformatics

    O'Reilly Media Mastering Python for Bioinformatics

    3 in stock

    Book SynopsisThis practical guide shows postdoc bioinformatics professionals and students how to exploit the best parts of Python to solve problems in biology while creating documented, tested, reproducible software.

    3 in stock

    £59.99

  • Learning Ray

    O'Reilly Media Learning Ray

    1 in stock

    Book SynopsisWith this practical book, Python programmers, data engineers, and data scientists will learn how to leverage Ray locally and spin up compute clusters. You'll be able to use Ray to structure and run machine learning programs at scale.

    1 in stock

    £39.74

  • Scaling Python with Ray

    O'Reilly Media Scaling Python with Ray

    4 in stock

    Book SynopsisIn this book, authors Holden Karau and Boris Lublinsky show you how to scale existing Python applications and pipelines, allowing you to stay in the Python ecosystem while avoiding single points of failure and manual scheduling.

    4 in stock

    £39.74

  • Trino The Definitive Guide

    O'Reilly Media Trino The Definitive Guide

    1 in stock

    Book SynopsisIn the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle.

    1 in stock

    £47.99

  • Modern Statistics for Modern Biology

    Cambridge University Press Modern Statistics for Modern Biology

    15 in stock

    Book SynopsisIf you are a biologist and want to get the best out of the powerful methods of modern computational statistics, this is your book. You can visualize and analyze your own data, apply unsupervised and supervised learning, integrate datasets, apply hypothesis testing, and make publication-quality figures using the power of R/Bioconductor and ggplot2. This book will teach you ''cooking from scratch'', from raw data to beautiful illuminating output, as you learn to write your own scripts in the R language and to use advanced statistics packages from CRAN and Bioconductor. It covers a broad range of basic and advanced topics important in the analysis of high-throughput biological data, including principal component analysis and multidimensional scaling, clustering, multiple testing, unsupervised and supervised learning, resampling, the pitfalls of experimental design, and power simulations using Monte Carlo, and it even reaches networks, trees, spatial statistics, image data, and microbial ecology. Using a minimum of mathematical notation, it builds understanding from well-chosen examples, simulation, visualization, and above all hands-on interaction with data and code.Trade Review'This is a gorgeous book, both visually and intellectually, superbly suited for anyone who wants to learn the nuts and bolts of modern computational biology. It can also be a practical, hands-on starting point for life scientists and students who want to break out of 'canned packages' into the more versatile world of R coding. Much richer than the typical statistics textbook, it covers a wide range of topics in machine learning and image processing. The chapter on making high-quality graphics is alone worth the price of the book.' William H. Press, University of Texas, Austin'The book is a timely, comprehensive and practical reference for anyone working with modern quantitative biotechnologies. It can be read at multiple levels. For scientists with a statistics background, it is a thorough review of key methods for design and analysis of high-throughput experiments. For life scientists with a limited exposure to statistics, it offers a series of examples with relevant data and R code. Avoiding buzzwords and hype, the book advocates appropriate statistical practice for reproducible research. I expect it to be as influential for the life sciences community as Modern Applied Statistics with S, by Venables and Ripley or Introduction to Statistical Learning, by James, Witten, Hastie and Tibshirani are for applied statistics.' Olga Vitek, Northeastern University, Boston'Navigating rich data to arrive at sensible insight requires confidence in our biological understanding, informatic ability, statistical sophistication, and skills at effective communication. Fortunately the wisdom and effort of the worldwide research community has been distilled into accessible and rich collections of R and Bioconductor software packages. Holmes and Huber provide a comprehensive guide to navigating modern statistical methods for working with complex, large, and nuanced biological data. The presentation provides a firm conceptual foundation coupled with worked practical examples, extended analysis, and refined discussion of practical and theoretical challenges facing the modern practitioner. This book provides us with the confidence and tools necessary for the analysis and comprehension of modern biological data using modern statistical methods.' Martin Morgan, Roswell Park Comprehensive Cancer Center, leader of the Bioconductor project'Holmes and Huber take an integrated approach to presenting the key statistical concepts and methods needed for the analysis of biological data. Specifically, they do a wonderful job of building these foundations in the context of modern computational tools, genuine scientific questions, and real-world datasets. The code showcases many of the newest features of R and its dynamic package ecosystem, such as using ggplot2 for visualization and dplyr for data manipulation.' Jenny Bryan, RStudio and University of British Columbia'... the book is extremely readable and engaging, it explains complicated concepts in simple terms, and uses illuminating graphics and examples. Any researcher who wants to learn or teach up-to-date statistics to biologists will find this an essential volume for modern teaching of modern statistics to modern biologists.' Noa Pinter-Wollman, The Quarterly Review of BiologyTable of ContentsIntroduction; 1. Generative models for discrete data; 2. Statistical modeling; 3. High-quality graphics in R; 4. Mixture models; 5. Clustering; 6. Testing; 7. Multivariate analysis; 8. High-throughput count data; 9. Multivariate methods for heterogeneous data; 10. Networks and trees; 11. Image data; 12. Supervised learning; 13. Design of high-throughput experiments and their analyses; Statistical concordance; Bibliography; Index.

    15 in stock

    £49.99

  • Interpreting Discrete Choice Models

    Cambridge University Press Interpreting Discrete Choice Models

    1 in stock

    Book SynopsisIn discrete choice models the relationships between the independent variables and the choice probabilities are nonlinear, depending on both the value of the particular independent variable being interpreted and the values of the other independent variables. Thus, interpreting the magnitude of the effects (the substantive effects) of the independent variables on choice behavior requires the use of additional interpretative techniques. Three common techniques for interpretation are described here: first differences, marginal effects and elasticities, and odds ratios. Concepts related to these techniques are also discussed, as well as methods to account for estimation uncertainty. Interpretation of binary logits, ordered logits, multinomial and conditional logits, and mixed discrete choice models such as mixed multinomial logits and random effects logits for panel data are covered in detail. The techniques discussed here are general, and can be applied to other models with discrete dependTable of Contents1. Introduction; 2. Accounting for Statistical Uncertainty in Estimates of Substantive Effects; 3. Substantive Effects in Binary Choice Models; 4. Substantive Effects in Ordered Choice Models; 5. Substantive Effects in Multinomial Choice Models; 6. Interpretation of Mixed Discrete Choice Models; 7. Extensions.

    1 in stock

    £17.00

  • Data Mining Algorithms

    John Wiley & Sons Inc Data Mining Algorithms

    15 in stock

    Book SynopsisData Mining Algorithms is a practical, technically-oriented guide to data mining algorithms that covers the most important algorithms for building classification, regression, and clustering models, as well as techniques used for attribute selection and transformation, model quality evaluation, and creating model ensembles. The author presents many of the important topics and methodologies widely used in data mining, whilst demonstrating the internal operation and usage of data mining algorithms using examples in R.Table of ContentsAcknowledgements xix Preface xxi References xxxi Part I Preliminaries 1 1 Tasks 3 1.1 Introduction 3 1.2 Inductive learning tasks 5 1.3 Classification 9 1.4 Regression 14 1.5 Clustering 16 1.6 Practical issues 19 1.7 Conclusion 20 1.8 Further readings 21 References 22 2 Basic statistics 23 2.1 Introduction 23 2.2 Notational conventions 24 2.3 Basic statistics as modeling 24 2.4 Distribution description 25 2.5 Relationship detection 47 2.6 Visualization 62 2.7 Conclusion 65 2.8 Further readings 66 References 67 Part II Classification 69 3 Decision trees 71 3.1 Introduction 71 3.2 Decision tree model 72 3.3 Growing 76 3.4 Pruning 90 3.5 Prediction 103 3.6 Weighted instances 105 3.7 Missing value handling 106 3.8 Conclusion 114 3.9 Further readings 114 References 116 4 Naïve Bayes classifier 118 4.1 Introduction 118 4.2 Bayes rule 118 4.3 Classification by Bayesian inference 120 4.4 Practical issues 125 4.5 Conclusion 131 4.6 Further readings 131 References 132 5 Linear classification 134 5.1 Introduction 134 5.2 Linear representation 136 5.3 Parameter estimation 145 5.4 Discrete attributes 154 5.5 Conclusion 155 5.6 Further readings 156 References 157 6 Misclassification costs 159 6.1 Introduction 159 6.2 Cost representation 161 6.3 Incorporating misclassification costs 164 6.4 Effects of cost incorporation 176 6.5 Experimental procedure 180 6.6 Conclusion 184 6.7 Further readings 185 References 187 7 Classification model evaluation 189 7.1 Introduction 189 7.2 Performance measures 190 7.3 Evaluation procedures 213 7.4 Conclusion 231 7.5 Further readings 232 References 233 Part III Regression 235 8 Linear regression 237 8.1 Introduction 237 8.2 Linear representation 238 8.3 Parameter estimation 242 8.4 Discrete attributes 250 8.5 Advantages of linear models 251 8.6 Beyond linearity 252 8.7 Conclusion 258 8.8 Further readings 258 References 259 9 Regression trees 261 9.1 Introduction 261 9.2 Regression tree model 262 9.3 Growing 263 9.4 Pruning 274 9.5 Prediction 277 9.6 Weighted instances 278 9.7 Missing value handling 279 9.8 Piecewise linear regression 284 9.9 Conclusion 292 9.10 Further readings 292 References 293 10 Regression model evaluation 295 10.1 Introduction 295 10.2 Performance measures 296 10.3 Evaluation procedures 303 10.4 Conclusion 309 10.5 Further readings 309 References 310 Part IV Clustering 311 11 (Dis)similarity measures 313 11.1 Introduction 313 11.2 Measuring dissimilarity and similarity 313 11.3 Difference-based dissimilarity 314 11.4 Correlation-based similarity 321 11.5 Missing attribute values 324 11.6 Conclusion 325 11.7 Further readings 325 References 326 12 k-Centers clustering 328 12.1 Introduction 328 12.2 Algorithm scheme 330 12.3 k-Means 334 12.4 Beyond means 338 12.5 Beyond (fixed) k 342 12.6 Explicit cluster modeling 343 12.7 Conclusion 345 12.8 Further readings 345 References 347 13 Hierarchical clustering 349 13.1 Introduction 349 13.2 Cluster hierarchies 351 13.3 Agglomerative clustering 353 13.4 Divisive clustering 361 13.5 Hierarchical clustering visualization 364 13.6 Hierarchical clustering prediction 366 13.7 Conclusion 369 13.8 Further readings 370 References 371 14 Clustering model evaluation 373 14.1 Introduction 373 14.2 Per-cluster quality measures 376 14.3 Overall quality measures 385 14.4 External quality measures 393 14.5 Using quality measures 397 14.6 Conclusion 398 14.7 Further readings 398 References 399 Part V Getting Better Models 401 15 Model ensembles 403 15.1 Introduction 403 15.2 Model committees 404 15.3 Base models 406 15.4 Model aggregation 420 15.5 Specific ensemble modeling algorithms 431 15.6 Quality of ensemble predictions 448 15.7 Conclusion 449 15.8 Further readings 450 References 451 16 Kernel methods 454 16.1 Introduction 454 16.2 Support vector machines 457 16.3 Support vector regression 473 16.4 Kernel trick 482 16.5 Kernel functions 484 16.6 Kernel prediction 487 16.7 Kernel-based algorithms 489 16.8 Conclusion 494 16.9 Further readings 495 References 496 17 Attribute transformation 498 17.1 Introduction 498 17.2 Attribute transformation task 499 17.3 Simple transformations 504 17.4 Multiclass encoding 510 17.5 Conclusion 521 17.6 Further readings 521 References 522 18 Discretization 524 18.1 Introduction 524 18.2 Discretization task 525 18.3 Unsupervised discretization 530 18.4 Supervised discretization 533 18.5 Effects of discretization 551 18.6 Conclusion 553 18.7 Further readings 553 References 556 19 Attribute selection 558 19.1 Introduction 558 19.2 Attribute selection task 559 19.3 Attribute subset search 562 19.4 Attribute selection filters 568 19.5 Attribute selection wrappers 588 19.6 Effects of attribute selection 593 19.7 Conclusion 598 19.8 Further readings 599 References 600 20 Case studies 602 20.1 Introduction 602 20.2 Census income 605 20.3 Communities and crime 631 20.4 Cover type 640 20.5 Conclusion 654 20.6 Further readings 655 References 655 Closing 657 A Notation 659 A.1 Attribute values 659 A.2 Data subsets 659 A.3 Probabilities 660 B R packages 661 B.1 CRAN packages 661 B.2 DMR packages 662 B.3 Installing packages 663 References 664 C Datasets 666 Index 667

    15 in stock

    £56.66

© 2025 Book Curl

    • American Express
    • Apple Pay
    • Diners Club
    • Discover
    • Google Pay
    • Maestro
    • Mastercard
    • PayPal
    • Shop Pay
    • Union Pay
    • Visa

    Login

    Forgot your password?

    Don't have an account yet?
    Create account