Data mining Books
Cambridge University Press Trustworthy Online Controlled Experiments
Book SynopsisGetting numbers is easy; getting trustworthy numbers is hard. From experimentation leaders at Amazon, Google, LinkedIn, and Microsoft, this guide to accelerating innovation using A/B tests includes practical examples, pitfalls, and advice for students and industry professionals, plus deeper dives into advanced topics for experienced practitioners.Trade Review'At the core of the Lean Methodology is the scientific method: Creating hypotheses, running experiments, gathering data, extracting insight and validation or modification of the hypothesis. A/B testing is the gold standard of creating verifiable and repeatable experiments, and this book is its definitive text.' Steve Blank, Adjunct professor at Stanford University, father of modern entrepreneurship, author of The Startup Owner's Manual and The Four Steps to the Epiphany'This book is a great resource for executives, leaders, researchers or engineers looking to use online controlled experiments to optimize product features, project efficiency or revenue. I know firsthand the impact that Kohavi's work had on Bing and Microsoft, and I'm excited that these learnings can now reach a wider audience.' Harry Shum, EVP, Microsoft Artificial Intelligence and Research Group'A great book that is both rigorous and accessible. Readers will learn how to bring trustworthy controlled experiments, which have revolutionized internet product development, to their organizations.' Adam D'Angelo, Co-founder and CEO of Quora and former CTO of Facebook'This book is a great overview of how several companies use online experimentation and A/B testing to improve their products. Kohavi, Tang and Xu have a wealth of experience and excellent advice to convey, so the book has lots of practical real world examples and lessons learned over many years of the application of these techniques at scale.' Jeff Dean, Google Senior Fellow and SVP Google Research'Do you want your organization to make consistently better decisions? This is the new bible of how to get from data to decisions in the digital age. Reading this book is like sitting in meetings inside Amazon, Google, LinkedIn, Microsoft. The authors expose for the first time the way the world's most successful companies make decisions. Beyond the admonitions and anecdotes of normal business books, this book shows what to do and how to do it well. It's the how-to manual for decision-making in the digital world, with dedicated sections for business leaders, engineers, and data analysts.' Scott Cook, Intuit Co-founder & Chairman of the Executive Committee'Online controlled experiments are powerful tools. Understanding how they work, what their strengths are, and how they can be optimized can illuminate both specialists and a wider audience. This book is the rare combination of technically authoritative, enjoyable to read, and dealing with highly important matters.' John P. A. Ioannidis, Stanford University'Kohavi, Tang, and Xu are pioneers of online experimentation. The platforms they've built and the experiments they've enabled have transformed some of the largest internet brands. Their research and talks have inspired teams across the industry to adopt experimentation. This book is the authoritative yet practical text that the industry has been waiting for.' Adil Aijaz, Co-founder and CEO, Split Software'Which online option will be better? We frequently need to make such choices, and frequently err. To determine what will actually work better, we need rigorous controlled experiments, aka A/B testing. This excellent and lively book by experts from Microsoft, Google, and LinkedIn presents the theory and best practices of A/B testing. A must read for anyone who does anything online!' Gregory Piatetsky-Shapiro, Ph.D., president of KDnuggets, co-founder of SIGKDD, and LinkedIn Top Voice on Data Science & Analytics'Ron Kohavi, Diane Tang and Ya Xu are the world's top experts on online experiments. I've been using their work for years and I'm delighted they have now teamed up to write the definitive guide. I recommend this book to all my students and everyone involved in online products and services.' Erik Brynjolfsson, Massachusetts Institute of Technology, co-author of The Second Machine Age'A modern software-supported business cannot compete successfully without online controlled experimentation. Written by three of the most experienced leaders in the field, this book presents the fundamental principles, illustrates them with compelling examples, and digs deeper to present a wealth of practical advice. It's a 'must read'! Foster Provost, New York University and co-author of the best-selling Data Science for Business'In the past two decades the technology industry has learned what scientists have known for centuries: that controlled experiments are among the best tools to understand complex phenomena and to solve very challenging problems. The ability to design controlled experiments, run them at scale, and interpret their results is the foundation of how modern high tech businesses operate. Between them the authors have designed and implemented several of the world's most powerful experimentation platforms. This book is a great opportunity to learn from their experiences about how to use these tools and techniques.' Kevin Scott, EVP and CTO of Microsoft'Online experiments have fueled the success of Amazon, Microsoft, LinkedIn and other leading digital companies. This practical book gives the reader rare access to decades of experimentation experience at these companies and should be on the bookshelf of every data scientist, software engineer and product manager.' Stefan Thomke, William Barclay Harding Professor, Harvard Business School, author of Experimentation Works: The Surprising Power of Business Experiments'The secret sauce for a successful online business is experimentation. But it is a secret no longer. Here three masters of the art describe the ABCs of A/B testing so that you too can continuously improve your online services.' Hal Varian, Chief Economist, Google, and author of Intermediate Microeconomics: A Modern Approach'Experiments are the best tool for online products and services. This book is full of practical knowledge derived from years of successful testing at Microsoft Google and LinkedIn. Insights and best practices are explained with real examples and pitfalls, their markers and solutions identified. I strongly recommend this book!' Preston McAfee, former Chief Economist and VP of Microsoft'Experimentation is the future of digital strategy and 'Trustworthy Experiments' will be its Bible. Kohavi, Tang and Xu are three of the most noteworthy experts on experimentation working today and their book delivers a truly practical roadmap for digital experimentation that is useful right out of the box. The revealing case studies they conducted over many decades at Microsoft, Amazon, Google and LinkedIn are organized into easy to understand practical lessens with tremendous depth and clarity. It should be required reading for any manager of a digital business.' Sinan Aral, David Austin Professor of Management, Massachusetts Institute of Technology, and author of The Hype MachineTable of ContentsPreface – how to read this book; 1. Introduction and motivation; 2. Running and analyzing experiments: an end-to-end example; 3. Twyman's law and experimentation trustworthiness; 4. Experimentation platform and culture; Part II: 5. Speed matters: an end-to-end case study; 6. Organizational metrics; 7. Metrics for experimentation and the Overall Evaluation Criterion (OEC); 8. Institutional memory and aeta-analysis; 9. Ethics in controlled experiments; Part III: 10. Complementary techniques; 11. Observational causal studies; Part IV: 12. Client-side experiments; 13. Instrumentation; 14. Choosing a randomization unit; 15. Ramping experiment exposure: trading off speed, quality, and risk; 16. Scaling experiment analyses; Part V: 17. The statistics behind online controlled experiments; 18. Variance estimation and improved sensitivity: pitfalls and solutions; 19. The A/A test; 20. Triggering for improved sensitivity; 21. Guardrail metrics; 22. Leakage and interference between variants; 23. Measuring long-term treatment effects.
£30.99
Cambridge University Press Dive Into Deep Learning
Book SynopsisThis approachable text teaches all the concepts, the context, and the code needed to understand deep learning. Suitable for students and professionals, the book doesn't require any previous background in machine learning or deep learning. Interactive examples feature throughout, with runnable code and executable Jupyter notebooks available online.Trade Review'In less than a decade, the AI revolution has swept from research labs to broad industries to every corner of our daily life. Dive into Deep Learning is an excellent text on deep learning and deserves attention from anyone who wants to learn why deep learning has ignited the AI revolution: the most powerful technology force of our time.' Jensen Huang, Founder and CEO, NVIDIA'This is a timely, fascinating book, providing not only a comprehensive overview of deep learning principles but also detailed algorithms with hands-on programming code, and moreover, a state-of-the-art introduction to deep learning in computer vision and natural language processing. Dive into this book if you want to dive into deep learning!' Jiawei Han, Michael Aiken Chair Professor, University of Illinois at Urbana-Champaign'This is a highly welcome addition to the machine learning literature, with a focus on hands-on experience implemented via the integration of Jupyter notebooks. Students of deep learning should find this invaluable to become proficient in this field.' Bernhard Schölkopf,, Director, Max Planck Institute for Intelligent Systems'Dive into Deep Learning strikes an excellent balance between hands-on learning and in-depth explanation. I've used it in my deep learning course and recommend it to anyone who wants to develop a thorough and practical understanding of deep learning.' Colin Raffel, Assistant Professor, University of North Carolina, Chapel HillTable of ContentsInstallation; Notation; 1. Introduction; 2. Preliminaries; 3. Linear neural networks for regression; 4. Linear neural networks for classification; 5. Multilayer perceptrons; 6. Builders guide; 7. Convolutional neural networks; 8. Modern convolutional neural networks; 9. Recurrent neural networks; 10. Modern recurrent neural networks; 11. Attention mechanisms and transformers; Appendix. Tools for deep learning; Bibliography; Index.
£24.99
O'Reilly Media Python for Data Analysis 3e
Book SynopsisUpdated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively.
£47.99
Cambridge University Press The Science of Science
Book SynopsisThis is the first comprehensive overview of the exciting field of the 'science of science'. With anecdotes and detailed, easy-to-follow explanations of the research, this book is accessible to all scientists, policy makers, and administrators with an interest in the wider scientific enterprise.Trade Review'Wang and Barabási book is a manifesto for the science of science domain. Graduate students (as well as their mentors) owe the authors a debt of gratitude for this impressive synthesis of what is a fast-evolving field of research.' Pierre Azoulay, Massachusetts Institute of Technology'Analyzing quantitative aspects of science with state-of-art tools, Wang and Barabási have written an insightful and comprehensive book that will become a must-read for all scholars interested in science.' Yu Xie, Princeton University'In their engaging book, Wang and Barabási take a fresh look at the science of science. They convincingly argue that in the age of big data and AI applying the scientific method to science itself not only helps understand how science works but may even enhance it. We are compelled to consider the determinants of individual careers and what this means in the age of large-scale scientific collaborations. These and other questions around the meaning of scientific impact, in academia and beyond, make the book highly relevant to scientists, academic administrators and funders alike. By the time the final, forward-looking chapter ends we are hooked on all the correlations and predictions, and so it is only fitting that we are invited to join in, to help shape the field which is likely to be driven by a human-machine collaboration.' Magdalena Skipper, Nature'Overall, I found this book very stimulating. It made me wonder whether in-depth metrics analyses of 'only' the subjective narratives of authors, such as the references list they select, actually creates a foundation on which to form judgement rather than opinion? Namely, what fraction of these publications analysed for their metrics were actually underpinned by their data? As well as provoking thought, this book offers a feast of references, 424 in all. There are such further enticing reads as reference 396, Life3.0: Being Human in the Age of Artificial Intelligence. To conclude, I recommend this book for your library, and maybe even take it for your summer beach reading.' John R. Helliwell, Journal of Applied Crystallography'… a text that should appeal to practicing scientists curious about the structure of the whole scientific enterprise, academic administrators and policy makers interested in evidence-based decision-making, and researchers interested in contributing further to the "science of science." There is no better, handier, and more readable work to appeal to such audiences … Highly recommended.' M. Oromaner, Choice ConnectTable of ContentsIntroduction; Part I. The Science of Career: 1. Productivity of a scientist; 2. The H Index; 3. The Matthew Effect; 4. Age and Scientific Achievement; 5. Random Impact Rule; 6. The Q Factor; 7. Hot Streaks; Part II. The Science of Collaboration: 8. The increasing dominance of teams in science; 9. The Invisible College; 10. Coauthorship Networks; 11. Team Assembly; 12. Small and large teams; 13. Scientific Credit; 14. Credit Allocation; Part III. The Science of Impact: 15. Big Science; 16. Citation Disparity; 17. High Impact Papers; 18. Scientific Impact; 19. The Time Dimension of Science; 20. Ultimate Impact; Part IV. Outlook: 21. Can Science be Accelerated?; 22. Artificial Intelligence; 23. Bias and Causality in Science; Part V. Last thought; All the Science of Science: Appendix A1 Modeling team assembly; Appendix A2 Modeling Citations; References; Index.
£23.74
O'Reilly Media Learning SQL
Book SynopsisAs data floods into your company, you need to put it to work right away-and SQL is the best tool for the job. With the latest edition of this introductory guide, author Alan Beaulieu helps developers get up to speed with SQL fundamentals for writing database applications, performing administrative tasks, and generating reports.
£42.39
O'Reilly Media Fundamentals of Data Engineering
Book SynopsisWith this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle.
£47.99
HarperCollins Publishers Inc Everybody Lies
Book Synopsis New York Times BestsellerForeword by Steven Pinker,...
£21.74
Harvard Business Review Press Fusion Strategy
Book SynopsisTwo world-renowned experts on innovation and digital strategy explore how real-time data and AI will radically transform physical products—and the companies that make them.Tech giants like Facebook, Amazon, and Google can collect real-time data from billions of users. For companies that design and manufacture physical products, that type of fluid, data-rich information used to be a pipe dream. Now, with the rise of cheap and powerful sensors, supercomputing, and artificial intelligence, things are changing—fast.In Fusion Strategy, world-renowned innovation guru Vijay Govindarajan and digital strategy expert Venkat Venkatraman offer a first-of-its-kind playbook that will help industrial companies combine what they do best—create physical products—with what digitals do best—use algorithms and AI to parse expansive, interconnected datasets—to make strategic connections that would otherwise be impossible.The laws of
£21.25
Springer-Verlag New York Inc. The Elements of Statistical Learning Springer
Book SynopsisOverview of Supervised Learning.- Linear Methods for Regression.- Linear Methods for Classification.- Basis Expansions and Regularization.- Kernel Smoothing Methods.- Model Assessment and Selection.- Model Inference and Averaging.- Additive Models, Trees, and Related Methods.- Boosting and Additive Trees.- Neural Networks.- Support Vector Machines and Flexible Discriminants.- Prototype Methods and Nearest-Neighbors.- Unsupervised Learning.- Random Forests.- Ensemble Learning.- Undirected Graphical Models.- High-Dimensional Problems: p ? N.Trade ReviewFrom the reviews:"Like the first edition, the current one is a welcome edition to researchers and academicians equally…. Almost all of the chapters are revised.… The Material is nicely reorganized and repackaged, with the general layout being the same as that of the first edition.… If you bought the first edition, I suggest that you buy the second editon for maximum effect, and if you haven’t, then I still strongly recommend you have this book at your desk. Is it a good investment, statistically speaking!" (Book Review Editor, Technometrics, August 2009, VOL. 51, NO. 3)From the reviews of the second edition:"This second edition pays tribute to the many developments in recent years in this field, and new material was added to several existing chapters as well as four new chapters … were included. … These additions make this book worthwhile to obtain … . In general this is a well written book which gives a good overview on statistical learning and can be recommended to everyone interested in this field. The book is so comprehensive that it offers material for several courses." (Klaus Nordhausen, International Statistical Review, Vol. 77 (3), 2009)“The second edition … features about 200 pages of substantial new additions in the form of four new chapters, as well as various complements to existing chapters. … the book may also be of interest to a theoretically inclined reader looking for an entry point to the area and wanting to get an initial understanding of which mathematical issues are relevant in relation to practice. … this is a welcome update to an already fine book, which will surely reinforce its status as a reference.” (Gilles Blanchard, Mathematical Reviews, Issue 2012 d)“The book would be ideal for statistics graduate students … . This book really is the standard in the field, referenced in most papers and books on the subject, and it is easy to see why. The book is very well written, with informative graphics on almost every other page. It looks great and inviting. You can flip the book open to any page, read a sentence or two and be hooked for the next hour or so.” (Peter Rabinovitch, The Mathematical Association of America, May, 2012)Table of ContentsIntroduction.- Overview of supervised learning.- Linear methods for regression.- Linear methods for classification.- Basis expansions and regularization.- Kernel smoothing methods.- Model assessment and selection.- Model inference and averaging.- Additive models, trees, and related methods.- Boosting and additive trees.- Neural networks.- Support vector machines and flexible discriminants.- Prototype methods and nearest-neighbors.- Unsupervised learning.
£58.49
Cambridge University Press Principles of Database Management
Book SynopsisThis comprehensive textbook teaches the fundamentals of database design, modeling, systems, data storage, and the evolving world of data warehousing, governance and more. Written by experienced educators and experts in big data, analytics, data quality, and data integration, it provides an up-to-date approach to database management. This full-color, illustrated text has a balanced theory-practice focus, covering essential topics, from established database technologies to recent trends, like Big Data, NoSQL, and more. Fundamental concepts are supported by real-world examples, query and code walkthroughs, and figures, making it perfect for introductory courses for advanced undergraduates and graduate students in information systems or computer science. These examples are further supported by an online playground with multiple learning environments, including MySQL, MongoDB, Neo4j Cypher, and tree structure visualization. This combined learning approach connects key concepts throughout the text to the important, practical tools to get started in database management.Trade Review'Although there have been a series of classical textbooks on database systems, the new dramatic advances call for an updated text covering the latest significant topics, such as big data analytics, No-SQL and much more. Fortunately, this is exactly what this book has to offer. It is highly desirable for training the next generation of data management professionals.' Jian Pei, Simon Fraser University, Canada'I haven't seen an as up-to-date and comprehensive textbook for Database Management as this one in many years. Principles of Database Management combines a number of classical and recent topics concerning Data Modeling, Relational Databases, Object-Oriented Databases, XML, Distributed Data Management, NoSQL and Big Data in an unprecedented manner. The authors did a great job in stitching these topics into one coherent and compelling story that will serve as an ideal basis for teaching both introductory and advanced courses.' Martin Theobald, University of Luxembourg'This is a very timely book with outstanding coverage of database topics and excellent treatment of database details. It not only gives very solid discussions of traditional topics like data modeling and relational databases but also contains refreshing contents on frontier topics such as XML databases, NoSQL databases, big data, and analytics. For those reasons, this will be a good book for database professionals who will keep using it for all stages of database studies and works.' J. Leon Zhao, City University of Hong Kong'This accessible, authoritative book introduces the reader the most important fundamental concepts of data management, while providing a practical view of recent advances. Both are essential for data professionals today.' Foster Provost, New York University, Stern School of Business'This guide to big and small data management addresses both fundamental principles and practical deployment. It reviews a range of databases and their relevance for analytics. The book is useful to practitioners because it contains many case studies, links to open-source software, and a very useful abstraction of analytics that will help them better choose solutions. It is important to academics because it promotes database principles which are key to successful and sustainable data science.' Sihem Amer-Yahia, Laboratoire d'Informatique de Grenoble and Editor-in-Chief the International Journal on Very Large DataBases'This book covers everything you will need to teach in a database implementation and design class. With some chapters covering big data, analytic models/methods, and No-SQL, it can keep our students up-to-date with these new technologies in data management related topics.' Han-fen Hu, University of Nevada, Las Vegas'As we are entering a new technological era of intelligent machines powered by data-driven algorithms, understanding fundamental concepts of data management and their most current practical applications has become more important than ever. This book is a timely guide for anyone interested in getting up to speed with the state of the art in database systems, big data technologies, and data science. It is full of insightful examples and case studies with direct industrial relevance.' Nesime Tatbul, Intel Labs and Massachusetts Institute of Technology'It is a pleasure to study this new book on database systems. The book offers a fantastically fresh approach to database teaching. The mix of theoretical and practical contents is almost perfect, the content is up-to-date and covers the recent ones, the examples are nice, and the database testbed provides an excellent way of understanding the concepts. Coupled with the authors 'expertise, this book is an important addition to the database field.' Arnab Bhattacharya, Indian Institute of Technology, Kanpur'Principles of Database Management is my favorite textbook for teaching a course on database management. Written in a well-illustrated style, this comprehensive book covers essential topics in established data management technologies and recent discoveries in data science. With a nice balance between theory and practice, it is not only an excellent teaching medium for students taking information management and/or data analytics courses, but also a quick and valuable reference for scientists and engineers working in this area.' Chuan Xiao, Graduate School of Informatics, Nagoya University'Data science success stories and big data applications are only possible because of advances in database technology. This book provides both a broad and deep introduction to databases. It covers the different types of database systems (from relational to noSQL) and manages to bridge the gap between data modeling and the underlying basic principles. The book is highly recommended for anyone that wants to understand how modern information systems deal with ever-growing volumes of data.' Wil van der Aalst, RWTH Aachen University'The database field has been evolving for several decades and the need for updated textbooks is continuous. Now, this need is covered by this fresh book by Lemahieu, van den Broucke and Baesens. It spans from traditional topics - such as the relational model and SQL - to more recent topics – such as distributed computing with Hadoop and Spark as well as data analytics. The book can be used as an introductory text and for graduate courses.' Yannis Manolopoulos, Data Science & Engineering Lab, Aristotle University of Thessaloniki'I like the way the book covers both traditional database topics and newer material such as big data, No-SQL databases, and data quality. The coverage is just right for my course and the level of the material is very appropriate for my students. The book also has clear explanations and good examples.' Barbara Klein, University of MichiganThis book provides a unique perspective on database management and how to store, manage, and analyze small and big data. The accompanying exercises and solutions, cases, slides, and YouTube lectures turn it into an indispensable resource for anyone teaching an undergraduate or postgraduate course on the topic.' Wolfgang Ketter, Erasmus University Rotterdam'This is a very modern textbook that fills the needs of current trends without sacrificing the need to cover the required database management systems fundamentals.' George Dimitoglou, Hood College, Maryland'This book is a much needed foundational piece on data management and data science. The authors successfully integrate the fields of database technology, operations research and big data analytics, which have often been covered independently in the past. A key asset is its didactical approach that builds on a rich set of industry examples and exercises. The book is a must-read for all scholars and practitioners interested in database management, big data analytics and its applications.' Jan Mendling, Institute for Information Business, ViennaTable of ContentsPreface; Part I. Databases and Database Design: 1. Fundamental concepts of database management; 2. Architecture and categorization of DBMSs; 3. Conceptual data modeling using the (E)ER model and UML class diagram; 4. Organizational aspects of data management; Part II. Types of Database Systems: 5. Legacy databases; 6. Relational databases: the relational model; 7. Relational databases: structured query language (SQL); 8. Object oriented databases and object persistence; 9. Extended relational databases; 10. XML databases; 11. NoSQL databases; Part III. Physical Data Storage, Transaction Management, and Database Access: 12. Physical file organization and indexing; 13. Physical database organization; 14. Basics of transaction management; 15. Accessing databases and database APIs; 16. Data distribution and distributed transaction management; Part IV. Data Warehousing, Data Governance and (Big) Data Analytics: 17. Data warehousing and business intelligence; 18. Data integration, data quality and data governance; 19. Big data; 20. Analytics; Appendix A. Cases and questions; Appendix B. Using the online environment; Appendix C. Answer key to select review questions; Glossary; Index.
£59.99
O'Reilly Media Data Quality Fundamentals
Book SynopsisDo your product dashboards look funky? Are your quarterly reports stale? Is the dataset you're using broken or just plain wrong? These problems affect almost every team, yet they're usually addressed on an ad hoc basis and in a reactive manner. If you answered yes to any of the questions above, this book is for you.
£39.74
Springer International Publishing AG Neural Networks and Deep Learning
Book SynopsisChapters 6 and 7 present radial-basis function (RBF) networks and restricted Boltzmann machines. Advanced topics in neural networks: Chapters 8, 9, and 10 discuss recurrent neural networks, convolutional neural networks, and graph neural networks.
£40.49
Cambridge University Press Computer Age Statistical Inference Student
Book SynopsisThe twenty-first century has seen a breathtaking expansion of statistical methodology, both in scope and influence. ''Data science'' and ''machine learning'' have become familiar terms in the news, as statistical methods are brought to bear upon the enormous data sets of modern science and commerce. How did we get here? And where are we going? How does it all fit together? Now in paperback and fortified with exercises, this book delivers a concentrated course in modern statistical thinking. Beginning with classical inferential theories - Bayesian, frequentist, Fisherian - individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov Chain Monte Carlo, inference after model selection, and dozens more. The distinctly modern approach integrates methodology and algorithms with statistical inference. Each chapter ends with class-tested exercises, and the book concludes with speculation on the future direction of statistics and data science.Table of ContentsPart I. Classic Statistical Inference: 1. Algorithms and inference; 2. Frequentist inference; 3. Bayesian inference; 4. Fisherian inference and maximum likelihood estimation; 5. Parametric models and exponential families; Part II. Early Computer-Age Methods: 6. Empirical Bayes; 7. James–Stein estimation and ridge regression; 8. Generalized linear models and regression trees; 9. Survival analysis and the EM algorithm; 10. The jackknife and the bootstrap; 11. Bootstrap confidence intervals; 12. Cross-validation and Cp estimates of prediction error; 13. Objective Bayes inference and Markov chain Monte Carlo; 14. Statistical inference and methodology in the postwar era; Part III. Twenty-First-Century Topics: 15. Large-scale hypothesis testing and false-discovery rates; 16. Sparse modeling and the lasso; 17. Random forests and boosting; 18. Neural networks and deep learning; 19. Support-vector machines and kernel methods; 20. Inference after model selection; 21. Empirical Bayes estimation strategies; Epilogue; References; Author Index; Subject Index.
£29.44
Pearson Education (US) Pandas for Everyone
Book SynopsisDaniel Chen is a graduate student in the Interdisciplinary PhD program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Polytechnic Institute and State University (Virginia Tech). He is involved with Software Carpentry as an instructor, Mentoring Committee Member, and currently serves as the Assessment Committee Chair. He completed his Masters in Public Health at Columbia University Mailman School of Public Health in Epidemiology with a certificate in Advanced Epidemiology and currently extending his Master's thesis work in the Social and Decision Analytics Laboratory under the Virginia Bioinformatics Institute on attitude diffusion in social networks.Table of ContentsForeword by Anne M. Brown xxiii Foreword by Jared Lander xxv Preface xxvii Changes in the Second Edition xxxix Part I: Introduction 1 Chapter 1. Pandas DataFrame Basics 3 Learning Objectives 3 1.1 Introduction 3 1.2 Load Your First Data Set 4 1.3 Look at Columns, Rows, and Cells 6 1.4 Grouped and Aggregated Calculations 23 1.5 Basic Plot 27 Conclusion 28 Chapter 2. Pandas Data Structures Basics 31 Learning Objectives 31 2.1 Create Your Own Data 31 2.2 The Series 33 2.3 The DataFrame 42 2.4 Making Changes to Series and DataFrames 45 2.5 Exporting and Importing Data 52 Conclusion 63 Chapter 3. Plotting Basics 65 Learning Objectives 65 3.1 Why Visualize Data? 65 3.2 Matplotlib Basics 66 3.3 Statistical Graphics Using matplotlib 72 3.4 Seaborn 78 3.5 Pandas Plotting Method 111 Conclusion 115 Chapter 4. Tidy Data 117 Learning Objectives 117 Note About This Chapter 117 4.1 Columns Contain Values, Not Variables 118 4.2 Columns Contain Multiple Variables 122 4.3 Variables in Both Rows and Columns 126 Conclusion 129 Chapter 5. Apply Functions 131 Learning Objectives 131 Note About This Chapter 131 5.1 Primer on Functions 131 5.2 Apply (Basics) 133 5.3 Vectorized Functions 138 5.4 Lambda Functions (Anonymous Functions) 141 Conclusion 142 Part II: Data Processing 143 Chapter 6. Data Assembly 145 Learning Objectives 145 6.1 Combine Data Sets 145 6.2 Concatenation 146 6.3 Observational Units Across Multiple Tables 154 6.4 Merge Multiple Data Sets 160 Conclusion 167 Chapter 7. Data Normalization 169 Learning Objectives 169 7.1 Multiple Observational Units in a Table (Normalization) 169 Conclusion 173 Chapter 8. Groupby Operations: Split-Apply-Combine 175 Learning Objectives 175 8.1 Aggregate 176 8.2 Transform 184 8.3 Filter 188 8.4 The pandas.core.groupby.DataFrameGroupBy object 190 8.5 Working with a MultiIndex 195 Conclusion 199 Part III: Data Types 203 Chapter 9. Missing Data 203 Learning Objectives 203 9.1 What Is a NaN Value? 203 9.2 Where Do Missing Values Come From? 205 9.3 Working with Missing Data 210 9.4 Pandas Built-In NA Missing 216 Conclusion 218 Chapter 10. Data Types 219 Learning Objectives 219 10.1 Data Types 219 10.2 Converting Types 220 10.3 Categorical Data 225 Conclusion 227 Chapter 11. Strings and Text Data 229 Introduction 229 Learning Objectives 229 11.1 Strings 229 11.2 String Methods 233 11.3 More String Methods 234 11.4 String Formatting (F-Strings) 236 11.5 Regular Expressions (RegEx) 239 11.6 The regex Library 247 Conclusion 247 Chapter 12. Dates and Times 249 Learning Objectives 249 12.1 Python's datetime Object 249 12.2 Converting to datetime 250 12.3 Loading Data That Include Dates 253 12.4 Extracting Date Components 254 12.5 Date Calculations and Timedeltas 257 12.6 Datetime Methods 259 12.7 Getting Stock Data 261 12.8 Subsetting Data Based on Dates 263 12.9 Date Ranges 266 12.10 Shifting Values 270 12.11 Resampling 276 12.12 Time Zones 278 12.13 Arrow for Better Dates and Times 280 Conclusion 280 Part IV: Data Modeling 281 Chapter 13. Linear Regression (Continuous Outcome Variable) 283 13.1 Simple Linear Regression 283 13.2 Multiple Regression 287 13.3 Models with Categorical Variables 289 13.4 One-Hot Encoding in scikit-learn with Transformer Pipelines 294 Conclusion 296 Chapter 14. Generalized Linear Models 297 About This Chapter 297 14.1 Logistic Regression (Binary Outcome Variable) 297 14.2 Poisson Regression (Count Outcome Variable) 304 14.3 More Generalized Linear Models 308 Conclusion 309 Chapter 15. Survival Analysis 311 15.1 Survival Data 311 15.2 Kaplan Meier Curves 312 15.3 Cox Proportional Hazard Model 314 Conclusion 317 Chapter 16. Model Diagnostics 319 16.1 Residuals 319 16.2 Comparing Multiple Models 324 16.3 k-Fold Cross-Validation 329 Conclusion 334 Chapter 17. Regularization 335 17.1 Why Regularize? 335 17.2 LASSO Regression 337 17.3 Ridge Regression 338 17.4 Elastic Net 340 17.5 Cross-Validation 341 Conclusion 343 Chapter 18. Clustering 345 18.1 k-Means 345 18.2 Hierarchical Clustering 351 Conclusion 356 Part V. Conclusion 357 Chapter 19. Life Outside of Pandas 359 19.1 The (Scientific) Computing Stack 359 19.2 Performance 360 19.3 Dask 360 19.4 Siuba 360 19.5 Ibis 361 19.6 Polars 361 19.7 PyJanitor 361 19.8 Pandera 361 19.9 Machine Learning 361 19.10 Publishing 362 19.11 Dashboards 362 Conclusion 362 Chapter 20. It's Dangerous To Go Alone! 363 20.1 Local Meetups 363 20.2 Conferences 363 20.3 The Carpentries 364 20.4 Podcasts 364 20.5 Other Resources 365 Conclusion 365 Appendices 367 A. Concept Maps 369B. Installation and Setup 373C. Command Line 377D. Project Templates 379E. Using Python 381F. Working Directories 383G. Environments 385H. Install Packages 389I. Importing Libraries 391J. Code Style 393K. Containers: Lists, Tuples, and Dictionaries 395L. Slice Values 399M. Loops 401N. Comprehensions 403O. Functions 405P. Ranges and Generators 409Q. Multiple Assignment 413R. Numpy ndarray 415S. Classes 417T. SettingWithCopyWarning 419U. Method Chaining 423V. Timing Code 427W. String Formatting 429X. Conditionals (if-elif-else) 433Y. New York ACS Logistic Regression Example 435Z. Replicating Results in R 443 Index 451
£34.19
Manning Publications Demand Forecasting Best Practices
Book SynopsisMaster the demand forecasting skills you need to decide what resources to acquire, products to produce, and where and how to distribute them. For demand planners, S&OP managers, supply chain leaders, and data scientists. Demand Forecasting Best Practices is a unique step-by-step guide, demonstrating forecasting tools, metrics, and models alongside stakeholder management techniques that work in a live business environment. You will learn how to: Lead a demand planning team to improve forecasting quality while reducing workload Properly define the objectives, granularity, and horizon of your demand planning process Use smart, value-weighted KPIs to track accuracy and bias Spot areas of your process where there is room for improvement Help planners and stakeholders (sales, marketing, finances) add value to your process Identify what kind of data you should be collecting, and how Utilise different types of statistical and machine learning models Follow author Nicolas Vandeput's original five-step framework for demand planning excellence and learn how to tailor it to your own company's needs. You will learn how to optimise demand planning for a more effective supply chain and will soon be delivering accurate predictions that drive major business value. About the technology Demand forecasting is vital for the success of any product supply chain. It allows companies to make better decisions about what resources to acquire, what products to produce, and where and how to distribute them. As an effective demand forecaster, you can help your organisation avoid overproduction, reduce waste, and optimise inventory levels for a real competitive advantage.
£27.89
John Wiley & Sons Inc Making Sense of Data I
Book SynopsisPraise for the First Edition . a well-written book on data analysis and data mining that provides an excellent foundation. CHOICE This is a must-read book for learning practical statistics and data analysis.Table of ContentsPREFACE ix 1 INTRODUCTION 1 1.1 Overview 1 1.2 Sources of Data 2 1.3 Process for Making Sense of Data 3 1.4 Overview of Book 13 1.5 Summary 16 Further Reading 16 2 DESCRIBING DATA 17 2.1 Overview 17 2.2 Observations and Variables 18 2.3 Types of Variables 20 2.4 Central Tendency 22 2.5 Distribution of the Data 24 2.6 Confidence Intervals 36 2.7 Hypothesis Tests 40 Exercises 42 Further Reading 45 3 PREPARING DATA TABLES 47 3.1 Overview 47 3.2 Cleaning the Data 48 3.3 Removing Observations and Variables 49 3.4 Generating Consistent Scales Across Variables 49 3.5 New Frequency Distribution 51 3.6 Converting Text to Numbers 52 3.7 Converting Continuous Data to Categories 53 3.8 Combining Variables 54 3.9 Generating Groups 54 3.10 Preparing Unstructured Data 55 Exercises 57 Further Reading 57 4 UNDERSTANDING RELATIONSHIPS 59 4.1 Overview 59 4.2 Visualizing Relationships Between Variables 60 4.3 Calculating Metrics About Relationships 69 Exercises 81 Further Reading 82 5 IDENTIFYING AND UNDERSTANDING GROUPS 83 5.1 Overview 83 5.2 Clustering 88 5.3 Association Rules 111 5.4 Learning Decision Trees from Data 122 Exercises 137 Further Reading 140 6 BUILDING MODELS FROM DATA 141 6.1 Overview 141 6.2 Linear Regression 149 6.3 Logistic Regression 161 6.4 k-Nearest Neighbors 167 6.5 Classification and Regression Trees 172 6.6 Other Approaches 178 Exercises 179 Further Reading 182 APPENDIX A ANSWERS TO EXERCISES 185 APPENDIX B HANDS-ON TUTORIALS 191 B.1 Tutorial Overview 191 B.2 Access and Installation 191 B.3 Software Overview 192 B.4 Reading in Data 193 B.5 Preparation Tools 195 B.6 Tables and Graph Tools 199 B.7 Statistics Tools 202 B.8 Grouping Tools 204 B.9 Models Tools 207 B.10 Apply Model 211 B.11 Exercises 211 BIBLIOGRAPHY 227 INDEX 231
£59.36
O'Reilly Media Snowflake The Definitive Guide
Book SynopsisSnowflake's ability to eliminate data silos and run workloads from a single platform creates opportunities to democratize data analytics, allowing users within an organization to make data-driven decisions. This clear, comprehensive guide will show you how to build integrated data applications and develop new revenue streams based on data.
£47.99
Princeton University Press Data Visualization
Book SynopsisTrade Review"[Healy’s] prose is engaging and chatty, and the style of instruction is unpretentious and practical . . . This single volume represents an excellent entry point for those wishing to upskill their abilities in data visualization."---Paul Cuffe, IEEE Transactions"Undoubtedly, this book is an excellent introduction to an essential tool for anyone who needs to collect and present data." * Conservation Biology *
£35.70
Taylor & Francis Inc Leadership Strategies in the Age of Big Data
Book SynopsisHarnessing the power of technology is one of the key measures of effective leadership. Leadership Strategies in the Age of Big Data, Algorithms, and Analytics will help leaders think and act like strategists to maintain a leading-edge competitive advantage. Written by a leading expert in the field, this book provides new insights on how to successfully transition companies by aligning an organization's culture to accept the benefits of digital technology.The author emphasizes the importance of creating a team spirit with employees to embrace the digital age and develop strategic business plans that pinpoint new markets for growth, strengthen customer relationships, and develop competitive strategies. Understanding how to deal with inconsistencies when facts generated by data analytics disagree with your own experience, intuition, and knowledge of the competitive situation is key to successful leadership.Table of ContentsChapter 1. Developing Effective Leadership:The human interface with big data, algorithms, and analytics. Chapter 2. Initiate speed of implementation to maintain a digital advantage. Chapter 3. Apply analytics to concentrate at the decisive point for maximum impact. Chapter 4. Activate maneuver and indirect approach to create surprise. Chapter 5. Employ big data to determine the culminating point of a competitive campaign. Chapter 6. Use data to determine how long to maintain offensive action. Chapter 7. Align big data with the corporate culture. Chapter 8. Decide on a bold approach or cautious restraint based on data analytics. Chapter 9. Utilize big data, algorithms, and analytics to maximize use of competitor intelligence. Chapter 10. Choose offensive and defensive strategies by understanding the human interaction. Chapter 11. Factor-in friction and luck that make analytics a gamble. Chapter 12. Use data to neutralize the competitor’s effectiveness. Appendix. Strategic Business Plan outline.
£47.49
APress Artificial Intelligence Basics
Book SynopsisArtificial intelligence touches nearly every part of your day. While you may initially assume that technology such as smart speakers and digital assistants are the extent of it, AI has in fact rapidly become a general-purpose technology, reverberating across industries including transportation, healthcare, financial services, and many more. In our modern era, an understanding of AI and its possibilities for your organization is essential for growth and success.Artificial Intelligence Basics has arrived to equip you with a fundamental, timely grasp of AI and its impact. Author Tom Taulli provides an engaging, non-technical introduction to important concepts such as machine learning, deep learning, natural language processing (NLP), robotics, and more. In addition to guiding you through real-world case studies and practical implementation steps, Taulli uses his expertise to expand on the bigger questions that surround AI. These include societal trends, ethics, andTable of Contents
£35.99
HarperCollins Publishers Inc Everybody Lies
Book Synopsis
£13.01
HarperCollins Publishers Inc Everybody Lies
Book Synopsis
£23.24
HarperCollins Publishers Inc Targeted
Book Synopsis
£23.19
HarperCollins Publishers Inc Targeted La Dictadura de Los Datos Spanish
Book SynopsisLa apasionante historia de Cambridge Analytica y el Big Data. ¿Está realmente a salvo nuestra democracia tras la victoria de Trump? La dictadura de los datos revela cómo han utilizado nuestros datos y nos advierte cómo podrían volver a hacerlo. Saben lo que compras. Brittany Kaiser, una novata asesora política especializada en Derechos Humanos y Relaciones Internacionales, creía que los datos recogidos y analizados por los smartphones y las redes sociales estaban en buenas manos hasta que conoció a Alexander Nix, el carismático líder de una nueva empresa de comunicación política llamada Cambridge Analytica. Lo que empezó siendo sólo un puesto de trabajo, pronto se convierte en una operación infame con el objetivo de ayudar a la elección de Trump o interferir en el referéndum que dio paso al Brexit.
£15.29
Harper Business The Digital Silk Road
Book Synopsis
£26.99
Elsevier Science Publishing Co Inc Handbook of Statistical Analysis and Data Mining
Book SynopsisTrade Review"Data mining practitioners, here is your bible, the complete "driver's manual" for data mining. From starting the engine to handling the curves, this book covers the gamut of data mining techniques - including predictive analytics and text mining - illustrating how to achieve maximal value across business, scientific, engineering, and medical applications. What are the best practices through each phase of a data mining project? How can you avoid the most treacherous pitfalls? The answers are in here. "Going beyond its responsibility as a reference book, the heavily-updated second edition also provides all-new, detailed tutorials with step-by-step instructions to drive established data mining software tools across real world applications. This way, newcomers start their engines immediately and experience hands-on success. "What's more, this edition drills down on hot topics across seven new chapters, including deep learning and how to avert "b---s---" results. If you want to roll-up your sleeves and execute on predictive analytics, this is your definite, go-to resource. To put it lightly, if this book isn't on your shelf, you're not a data miner." --Eric Siegel, Ph.D., founder of Predictive Analytics World and author of "Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die" "Great introduction to the real-world process of data mining. The overviews, practical advice, tutorials, and extra CD material make this book an invaluable resource for both new and experienced data miners." --Karl Rexer, PhD (President and Founder of Rexer Analytics, Boston, Massachusetts)Table of ContentsPart 1: History Of Phases Of Data Analysis, Basic Theory, And The Data Mining Process 1. The Background for Data Mining Practice 2. Theoretical Considerations for Data Mining 3. The Data Mining and Predictive Analytic Process 4. Data Understanding and Preparation 5. Feature Selection 6. Accessory Tools for Doing Data Mining Part 2: The Algorithms And Methods In Data Mining And Predictive Analytics And Some Domain Areas 7. Basic Algorithms for Data Mining: A Brief Overview 8. Advanced Algorithms for Data Mining 9. Classification 10. Numerical Prediction 11. Model Evaluation and Enhancement 12. Predictive Analytics for Population Health and Care 13. Big Data in Education: New Efficiencies for Recruitment, Learning, and Retention of Students and Donors 14. Customer Response Modeling 15. Fraud Detection Part 3: Tutorials And Case Studies Tutorial A Example of Data Mining Recipes Using Windows 10 and Statistica 13 Tutorial B Using the Statistica Data Mining Workspace Method for Analysis of Hurricane Data (Hurrdata.sta) Tutorial C Case Study—Using SPSS Modeler and STATISTICA to Predict Student Success at High-Stakes Nursing Examinations (NCLEX) Tutorial D Constructing a Histogram in KNIME Using MidWest Company Personality Data Tutorial E Feature Selection in KNIME Tutorial F Medical/Business Tutorial Tutorial G A KNIME Exercise, Using Alzheimer’s Training Data of Tutorial F Tutorial H Data Prep 1-1: Merging Data Sources Tutorial I Data Prep 1–2: Data Description Tutorial J Data Prep 2-1: Data Cleaning and Recoding Tutorial K Data Prep 2-2: Dummy Coding Category Variables Tutorial L Data Prep 2-3: Outlier Handling Tutorial M Data Prep 3-1: Filling Missing Values With Constants Tutorial N Data Prep 3-2: Filling Missing Values With Formulas Tutorial O Data Prep 3-3: Filling Missing Values With a Model Tutorial P City of Chicago Crime Map: A Case Study Predicting Certain Kinds of Crime Using Statistica Data Miner and Text Miner Tutorial Q Using Customer Churn Data to Develop and Select a Best Predictive Model for Client Defection Using STATISTICA Data Miner 13 64-bit for Windows 10 Tutorial R Example With C&RT to Predict and Display Possible Structural Relationships Tutorial S Clinical Psychology: Making Decisions About Best Therapy for a Client Part 4: Model Ensembles, Model Complexity; Using the Right Model for the Right Use, Significance, Ethics, and the Future, and Advanced Processes 16. The Apparent Paradox of Complexity in Ensemble Modeling 17. The "Right Model" for the "Right Purpose": When Less Is Good Enough 18. A Data Preparation Cookbook 19. Deep Learning 20. Significance versus Luck in the Age of Mining: The Issues of P-Value "Significance" and "Ways to Test Significance of Our Predictive Analytic Models" 21. Ethics and Data Analytics 22. IBM Watson
£75.04
Elsevier Science & Technology Data Mining: Concepts and Techniques
Book SynopsisTable of Contents1. Introduction 2. Data, measurements, and data processing 3. Data warehousing and online analytical processing 4. Pattern mining: basic concepts and methods 5. Pattern mining: advanced methods 6. Classification: basic concepts and methods 7. Classification: advanced methods 8. Cluster analysis: basic concepts and methods 9. Cluster analysis: advanced methods 10. Deep learning 11. Outlier Detection 12. Data mining trends and research frontiers Appendix: Mathematical background
£62.06
Elsevier Science & Technology Analyzing Social Media Networks with NodeXL
Book SynopsisTable of ContentsPart I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social Media: New Technologies of Collaboration 3. Social Network Analysis: Measuring, Mapping, and Modeling Collections of Connections Part II. NodeXL Tutorial: Learning by Doing 4. Installation, Orientation, and Layout 5. Labeling and Visual Attributes 6. Calculating and Visualizing Network Metrics 7. Grouping and Filtering 8. Semantic Networks Part III. Social Media Network Analysis Case Studies 9. Email: The Lifeblood of Modern Communication 10. Thread Networks: Mapping Message Boards and Email Lists 11. Twitter: Information Flows, Influencers, and Organic Communities 12. Facebook: Public Pages and Inter-Organizational Networks 13. YouTube: Exploring Video Networks 14. Wiki Networks: Connections of Culture and Collaboration
£35.06
Pearson Education Getting Started with Data Science
Book SynopsisMurtaza Haider, Ph.D., is an Associate Professor at the Ted Rogers School of Management, Ryerson University, and the Director of a consulting firm Regionomics Inc. He is also a visiting research fellow at the Munk School of Global Affairs at the University of Toronto (2014-15). In addition, he is a senior research affiliate with the Canadian Network for Research on Terrorism, Security, and Society, and an adjunct professor of engineering at McGill University. Haider specializes in applying analytics and statistical methods to find solutions for socioeconomic challenges. His research interests include analytics; data science; housing market dynamics; infrastructure, transportation, and urban planning; and human development in North America and South Asia. He is an avid blogger/data journalist and writes weekly for the Dawn newspaper and occasionally for the Huffington Post. Haider holds a Masters in transport engineering and planning and a Ph.D. inTable of Contents Chapter 1 The Bazaar of Storytellers Chapter 2 Data in the 24/7 Connected World Chapter 3 The Deliverable Chapter 4 Serving Tables Chapter 5 Graphic Details Chapter 6 Hypothetically Speaking Chapter 7 Why Tall Parents Don’t Have Even Taller Children Chapter 8 To Be or Not to Be Chapter 9 Categorically Speaking About Categorical Data Chapter 10 Spatial Data Analytics Chapter 11 Doing Serious Time with Time Series Chapter 12 Data Mining for Gold Index
£23.99
Pearson Education (US) Big Data Fundamentals
Book SynopsisThomas Erl is a top-selling IT author, founder of Arcitura Education and series editor of the Prentice Hall Service Technology Series from Thomas Erl. With more than 200,000 copies in print worldwide, his books have become international bestsellers and have been formally endorsed by senior members of major IT organizations, such as IBM, Microsoft, Oracle, Intel, Accenture, IEEE, HL7, MITRE, SAP, CISCO, HP and many others. As CEO of Arcitura Education Inc., Thomas has led the development of curricula for the internationally recognized Big Data Science Certified Professional (BDSCP), Cloud Certified Professional (CCP) and SOA Certified Professional (SOACP) accreditation programs, which have established a series of formal, vendor-neutral industry certifications obtained by thousands of IT professionals around the world. Thomas has toured more than 20 countries as a speaker and instructor. More than 100 articles and interviews by Thomas have been published in numerous publicaTable of ContentsAcknowledgments xviiReader Services xviiiPART I: THE FUNDAMENTALS OF BIG DATAChapter 1: Understanding Big Data 3 Concepts and Terminology 5 Datasets 5 Data Analysis 6 Data Analytics 6 Descriptive Analytics 8 Diagnostic Analytics 9 Predictive Analytics 10 Prescriptive Analytics 11 Business Intelligence (BI) 12 Key Performance Indicators (KPI) 12 Big Data Characteristics 13 Volume 14 Velocity 14 Variety 15 Veracity 16 Value 16 Different Types of Data 17 Structured Data 18 Unstructured Data 19 Semi-structured Data 19 Metadata 20 Case Study Background 20 History 20 Technical Infrastructure and Automation Environment 21 Business Goals and Obstacles 22 Case Study Example 24 Identifying Data Characteristics 26 Volume 26 Velocity 26 Variety 26 Veracity 26 Value 27 Identifying Types of Data 27 Chapter 2: Business Motivations and Drivers for Big Data Adoption 29 Marketplace Dynamics 30 Business Architecture 33 Business Process Management 36 Information and Communications Technology 37 Data Analytics and Data Science 37 Digitization 38 Affordable Technology and Commodity Hardware 38 Social Media 39 Hyper-Connected Communities and Devices 40 Cloud Computing 40 Internet of Everything (IoE) 42 Case Study Example 43 Chapter 3: Big Data Adoption and Planning Considerations 47 Organization Prerequisites 49 Data Procurement 49 Privacy 49 Security 50 Provenance 51 Limited Realtime Support 52 Distinct Performance Challenges 53 Distinct Governance Requirements 53 Distinct Methodology 53 Clouds 54 Big Data Analytics Lifecycle 55 Business Case Evaluation 56 Data Identification 57 Data Acquisition and Filtering 58 Data Extraction 60 Data Validation and Cleansing 62 Data Aggregation and Representation 64 Data Analysis 66 Data Visualization 68 Utilization of Analysis Results 69 Case Study Example 71 Big Data Analytics Lifecycle 73 Business Case Evaluation 73 Data Identification 74 Data Acquisition and Filtering 74 Data Extraction 74 Data Validation and Cleansing 75 Data Aggregation and Representation 75 Data Analysis 75 Data Visualization 76 Utilization of Analysis Results 76 Chapter 4: Enterprise Technologies and Big Data Business Intelligence 77 Online Transaction Processing (OLTP) 78 Online Analytical Processing (OLAP) 79 Extract Transform Load (ETL) 79 Data Warehouses 80 Data Marts 81 Traditional BI 82 Ad-hoc Reports 82 Dashboards 82 Big Data BI 84 Traditional Data Visualization 84 Data Visualization for Big Data 85 Case Study Example 86 Enterprise Technology 86 Big Data Business Intelligence 87 PART II: STORING AND ANALYZING BIG DATAChapter 5: Big Data Storage Concepts 91 Clusters 93 File Systems and Distributed File Systems 93 NoSQL 94 Sharding 95 Replication 97 Master-Slave 98 Peer-to-Peer 100 Sharding and Replication 103 Combining Sharding and Master-Slave Replication 104 Combining Sharding and Peer-to-Peer Replication 105 CAP Theorem 106 ACID 108 BASE 113 Case Study Example 117 Chapter 6: Big Data Processing Concepts 119 Parallel Data Processing 120 Distributed Data Processing 121 Hadoop 122 Processing Workloads 122 Batch 123 Transactional 123 Cluster 124 Processing in Batch Mode 125 Batch Processing with MapReduce 125 Map and Reduce Tasks 126 Map 127 Combine 127 Partition 129 Shuffle and Sort 130 Reduce 131 A Simple MapReduce Example 133 Understanding MapReduce Algorithms 134 Processing in Realtime Mode 137 Speed Consistency Volume (SCV) 137 Event Stream Processing 140 Complex Event Processing 141 Realtime Big Data Processing and SCV 141 Realtime Big Data Processing and MapReduce 142 Case Study Example 143 Processing Workloads 143 Processing in Batch Mode 143 Processing in Realtime 144 Chapter 7: Big Data Storage Technology 145 On-Disk Storage Devices 147 Distributed File Systems 147 RDBMS Databases 149 NoSQL Databases 152 Characteristics 152 Rationale 153 Types 154 Key-Value 156 Document 157 Column-Family 159 Graph 160 NewSQL Databases 163 In-Memory Storage Devices 163 In-Memory Data Grids 166 Read-through 170 Write-through 170 Write-behind 172 Refresh-ahead 172 In-Memory Databases 175 Case Study Example 179 Chapter 8: Big Data Analysis Techniques 181 Quantitative Analysis 183 Qualitative Analysis 184 Data Mining 184 Statistical Analysis 184 A/B Testing 185 Correlation 186 Regression 188 Machine Learning 190 Classification (Supervised Machine Learning) 190 Clustering (Unsupervised Machine Learning) 191 Outlier Detection 192 Filtering 193 Semantic Analysis 195 Natural Language Processing 195 Text Analytics 196 Sentiment Analysis 197 Visual Analysis 198 Heat Maps 198 Time Series Plots 200 Network Graphs 201 Spatial Data Mapping 202 Case Study Example 204 Correlation 204 Regression 204 Time Series Plot 205 Clustering 205 Classification 205 Appendix A: Case Study Conclusion 207About the Authors 211 Thomas Erl 211 Wajid Khattak 211 Paul Buhler 212 Index 213
£28.02
Pearson Education (US) Learning Deep Learning
Book SynopsisMagnus Ekman, Ph.D., is a director of architecture at NVIDIA Corporation. His doctorate is in computer engineering, and he is the inventor of multiple patents. He was first exposed to artificial neural networks in the late nineties in his native country, Sweden. After some dabbling in evolutionary computation, he ended up focusing on computer architecture and relocated to Silicon Valley, where he lives with his wife Jennifer, children Sebastian and Sofia, and dog Babette. He previously worked with processor design and R&D at Sun Microsystems and Samsung Research America, and has been involved in starting two companies, one of which (Skout) was later acquired by The Meet Group, Inc. In his current role at NVIDIA, he leads an engineering team working on CPU performance and power efficiency for system on chips targeting the autonomous vehicle market. As the Deep Learning (DL) field exploded the past few years, fueled by NVIDIA's GPU technology and CUDA, Dr. Ekman fTable of ContentsForeword by Dr. Anima Anandkumar xxiForeword by Dr. Craig Clawson xxiiiPreface xxvAcknowledgments liAbout the Author liii Chapter 1: The Rosenblatt Perceptron 1 Example of a Two-Input Perceptron 4 The Perceptron Learning Algorithm 7 Limitations of the Perceptron 15 Combining Multiple Perceptrons 17 Implementing Perceptrons with Linear Algebra 20 Geometric Interpretation of the Perceptron 30 Understanding the Bias Term 33 Concluding Remarks on the Perceptron 34 Chapter 2: Gradient-Based Learning 37 Intuitive Explanation of the Perceptron Learning Algorithm 37 Derivatives and Optimization Problems 41 Solving a Learning Problem with Gradient Descent 44 Constants and Variables in a Network 48 Analytic Explanation of the Perceptron Learning Algorithm 49 Geometric Description of the Perceptron Learning Algorithm 51 Revisiting Different Types of Perceptron Plots 52 Using a Perceptron to Identify Patterns 54 Concluding Remarks on Gradient-Based Learning 57 Chapter 3: Sigmoid Neurons and Backpropagation 59 Modified Neurons to Enable Gradient Descent for Multilevel Networks 60 Which Activation Function Should We Use? 66 Function Composition and the Chain Rule 67 Using Backpropagation to Compute the Gradient 69 Backpropagation with Multiple Neurons per Layer 81 Programming Example: Learning the XOR Function 82 Network Architectures 87 Concluding Remarks on Backpropagation 89 Chapter 4: Fully Connected Networks Applied to Multiclass Classification 91 Introduction to Datasets Used When Training Networks 92 Training and Inference 100 Extending the Network and Learning Algorithm to Do Multiclass Classification 101 Network for Digit Classification 102 Loss Function for Multiclass Classification 103 Programming Example: Classifying Handwritten Digits 104 Mini-Batch Gradient Descent 114 Concluding Remarks on Multiclass Classification 115 Chapter 5: Toward DL: Frameworks and Network Tweaks 117 Programming Example: Moving to a DL Framework 118 The Problem of Saturated Neurons and Vanishing Gradients 124 Initialization and Normalization Techniques to Avoid Saturated Neurons 126 Cross-Entropy Loss Function to Mitigate Effect of Saturated Output Neurons 130 Different Activation Functions to Avoid Vanishing Gradient in Hidden Layers 136 Variations on Gradient Descent to Improve Learning 141 Experiment: Tweaking Network and Learning Parameters 143 Hyperparameter Tuning and Cross-Validation 146 Concluding Remarks on the Path Toward Deep Learning 150 Chapter 6: Fully Connected Networks Applied to Regression 153 Output Units 154 The Boston Housing Dataset 160 Programming Example: Predicting House Prices with a DNN 161 Improving Generalization with Regularization 166 Experiment: Deeper and Regularized Models for House Price Prediction 169 Concluding Remarks on Output Units and Regression Problems 170 Chapter 7: Convolutional Neural Networks Applied to Image Classification 171 The CIFAR-10 Dataset 173 Characteristics and Building Blocks for Convolutional Layers 175 Combining Feature Maps into a Convolutional Layer 180 Combining Convolutional and Fully Connected Layers into a Network 181 Effects of Sparse Connections and Weight Sharing 185 Programming Example: Image Classification with a Convolutional Network 190 Concluding Remarks on Convolutional Networks 201 Chapter 8: Deeper CNNs and Pretrained Models 205 VGGNet 206 GoogLeNet 210 ResNet 215 Programming Example: Use a Pretrained ResNet Implementation 223 Transfer Learning 226 Backpropagation for CNN and Pooling 228 Data Augmentation as a Regularization Technique 229 Mistakes Made by CNNs 231 Reducing Parameters with Depthwise Separable Convolutions 232 Striking the Right Network Design Balance with EfficientNet 234 Concluding Remarks on Deeper CNNs 235 Chapter 9: Predicting Time Sequences with Recurrent Neural Networks 237 Limitations of Feedforward Networks 241 Recurrent Neural Networks 242 Mathematical Representation of a Recurrent Layer 243 Combining Layers into an RNN 245 Alternative View of RNN and Unrolling in Time 246 Backpropagation Through Time 248 Programming Example: Forecasting Book Sales 250 Dataset Considerations for RNNs 264 Concluding Remarks on RNNs 265 Chapter 10: Long Short-Term Memory 267 Keeping Gradients Healthy 267 Introduction to LSTM 272 LSTM Activation Functions 277 Creating a Network of LSTM Cells 278 Alternative View of LSTM 280 Related Topics: Highway Networks and Skip Connections 282 Concluding Remarks on LSTM 282 Chapter 11: Text Autocompletion with LSTM and Beam Search 285 Encoding Text 285 Longer-Term Prediction and Autoregressive Models 287 Beam Search 289 Programming Example: Using LSTM for Text Autocompletion 291 Bidirectional RNNs 298 Different Combinations of Input and Output Sequences 300 Concluding Remarks on Text Autocompletion with LSTM 302 Chapter 12: Neural Language Models and Word Embeddings 303 Introduction to Language Models and Their Use Cases 304 Examples of Different Language Models 307 Benefit of Word Embeddings and Insight into How They Work 313 Word Embeddings Created by Neural Language Models 315 Programming Example: Neural Language Model and Resulting Embeddings 319 King − Man + Woman! = Queen 329 King − Man + Woman ! = Queen 331 Language Models, Word Embeddings, and Human Biases 332 Related Topic: Sentiment Analysis of Text 334 Concluding Remarks on Language Models and Word Embeddings 342 Chapter 13: Word Embeddings from word2vec and GloVe 343 Using word2vec to Create Word Embeddings Without a Language Model 344 Additional Thoughts on word2vec 352 word2vec in Matrix Form 353 Wrapping Up word2vec 354 Programming Example: Exploring Properties of GloVe Embeddings 356 Concluding Remarks on word2vec and GloVe 361 Chapter 14: Sequence-to-Sequence Networks and Natural Language Translation 363 Encoder-Decoder Model for Sequence-to-Sequence Learning 366 Introduction to the Keras Functional API 368 Programming Example: Neural Machine Translation 371 Experimental Results 387 Properties of the Intermediate Representation 389 Concluding Remarks on Language Translation 391 Chapter 15: Attention and the Transformer 393 Rationale Behind Attention 394 Attention in Sequence-to-Sequence Networks 395 Alternatives to Recurrent Networks 406 Self-Attention 407 Multi-head Attention 410 The Transformer 411 Concluding Remarks on the Transformer 415 Chapter 16: One-to-Many Network for Image Captioning 417 Extending the Image Captioning Network with Attention 420 Programming Example: Attention-Based Image Captioning 421 Concluding Remarks on Image Captioning 443 Chapter 17: Medley of Additional Topics 447 Autoencoders 448 Multimodal Learning 459 Multitask Learning 469 Process for Tuning a Network 477 Neural Architecture Search 482 Concluding Remarks 502 Chapter 18: Summary and Next Steps 503 Things You Should Know by Now 503 Ethical AI and Data Ethics 505 Things You Do Not Yet Know 512 Next Steps 516 Appendix A: Linear Regression and Linear Classifiers 519 Linear Regression as a Machine Learning Algorithm 519 Computing Linear Regression Coefficients 523 Classification with Logistic Regression 525 Classifying XOR with a Linear Classifier 528 Classification with Support Vector Machines 531 Evaluation Metrics for a Binary Classifier 533 Appendix B: Object Detection and Segmentation 539 Object Detection 540 Semantic Segmentation 549 Instance Segmentation with Mask R-CNN 559 Appendix C: Word Embeddings Beyond word2vec and GloVe 563 Wordpieces 564 FastText 566 Character-Based Method 567 ELMo 572 Related Work 575 Appendix D: GPT, BERT, AND RoBERTa 577 GPT 578 BERT 582 RoBERTa 586 Historical Work Leading Up to GPT and BERT 588 Other Models Based on the Transformer 590 Appendix E: Newton-Raphson versus Gradient Descent 593 Newton-Raphson Root-Finding Method 594 Relationship Between Newton-Raphson and Gradient Descent 597 Appendix F: Matrix Implementation of Digit Classification Network 599 Single Matrix 599 Mini-Batch Implementation 602 Appendix G: Relating Convolutional Layers to Mathematical Convolution 607Appendix H: Gated Recurrent Units 613 Alternative GRU Implementation 616 Network Based on the GRU 616 Appendix I: Setting up a Development Environment 621 Python 622 Programming Environment 623 Programming Examples 624 Datasets 625 Installing a DL Framework 628 TensorFlow Specific Considerations 630 Key Differences Between PyTorch and TensorFlow 631 Appendix J: Cheat Sheets 637 Works Cited 647Index 667
£41.64
Elsevier Science Clinical Decision Support and Beyond
Book SynopsisTable of ContentsPreface SECTION I GOALS, METHODOLOGIES, AND CHALLENGES FOR CLINICAL DECISION SUPPORT AND BEYOND1. Definitions, Purposes, and Scope2. Clinical Decision Support Methods3. The Journey to Broad Adoption4. The Role of Quality Measurement and Reporting Feedback as a Driver for Care Improvement5. International Dimensions of Clinical Decision Support SystemsSECTION II SOURCES OF KNOWLEDGE FOR CLINICAL DECISION SUPPORT AND BEYOND6. Human-Intensive Techniques7. Data-Driven Approaches to Generating Knowledge: machine learning, artificial intelligence, and predictive modelling8. Modernizing Evidence Synthesis for Evidence-Based MedicineSECTION III THE TECHNOLOGY OF CLINICAL DECISION SUPPORT AND BEYOND9. Decision Rules and Expressions10. Guidelines and Workflow Models11. Ontologies, Vocabularies and Data Models12. Grouped Knowledge Elements13. Infobuttons and Point of Care Access to Knowledge14. Information Visualization and Integrated Information Display15. The Role of Standards16. Population Analytics and Decision Support17. Expanded sources for Precision Medicine18. Knowledge Resources SECTION IV ADOPTION OF CLINICAL DECISION SUPPORT AND OTHER MODES OF KNOWLEDGE ENHANCEMENT19. Cognitive Considerations for Health Information Technology20. CDS Implementation and Governance21. Managing the Investment in Clinical Decision Support22. Evaluation of Clinical Decision Support23. Legal and Regulatory Issues Related to the Use of Clinical Software in Health Care Delivery24. Patient-Centered Clinical Decision Support25. CDS and Health Disparities26. Population Health Management27. CDS for Public Health SECTION V THE JOURNEY TO A KNOWLEDGE-ENHANCED HEALTH AND HEALTH CARE SYSTEM28. Clinical Knowledge Management29. Integration of Knowledge Resources: Architectures30. Getting to Knowledge-Enhanced Health and Healthcare
£103.50
Cengage Learning, Inc Data Visualization
Book SynopsisDATA VISUALIZATION: Exploring and Explaining with Data is designed to introduce best practices in data visualization to undergraduate and graduate students. This is one of the first books on data visualization designed for college courses. The book contains material on effective design, choice of chart type, effective use of color, how to both explore data visually, and how to explain concepts and results visually in a compelling way with data. The book explains both the "why" of data visualization and the "how." That is, the book provides lucid explanations of the guiding principles of data visualization through the use of interesting examples.Table of Contents1. Introduction. 2. Selecting a Chart Type. 3. Data Visualization and Design. 4. Purposeful Use of Color. 5. Visualizing Variability. 6. Exploring Data Visually. 7. Explaining Visually to Influence with Data. 8. Data Dashboards. 9. Telling the Truth with Data Visualization.
£58.89
Taylor & Francis Ltd Multimedia Ontology
Book SynopsisTable of ContentsIntroduction. Ontology and the Semantic Web. Characterizing Multimedia Semantics. Ontology Representations for Multimedia. Multimedia Web Ontology Language. Modeling the Semantics of Multimedia Content. Learning Multimedia Ontology. Applications Exploiting Multimedia Semantics. Distributed Multimedia Applications. Application of Multimedia Ontology in Heritage Preservation. Open Problems and Future Detectors. Appendices.
£56.99
Springer-Verlag New York Inc. Machine Learning in Cyber Trust
Book SynopsisCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.Trade ReviewFrom the reviews: "This is a useful book on machine learning for cyber security applications. It will be helpful to researchers and graduate students who are looking for an introduction to a specific topic in the field. All of the topics covered are well researched. The book consists of 12 chapters, grouped into four parts." (Imad H. Elhajj, ACM Computing Reviews, October, 2009)Table of ContentsCyber System.- Cyber-Physical Systems: A New Frontier.- Security.- Misleading Learners: Co-opting Your Spam Filter.- Survey of Machine Learning Methods for Database Security.- Identifying Threats Using Graph-based Anomaly Detection.- On the Performance of Online Learning Methods for Detecting Malicious Executables.- Efficient Mining and Detection of Sequential Intrusion Patterns for Network Intrusion Detection Systems.- A Non-Intrusive Approach to Enhance Legacy Embedded Control Systems with Cyber Protection Features.- Image Encryption and Chaotic Cellular Neural Network.- Privacy.- From Data Privacy to Location Privacy.- Privacy Preserving Nearest Neighbor Search.- Reliability.- High-Confidence Compositional Reliability Assessment of SOA-Based Systems Using Machine Learning Techniques.- Model, Properties, and Applications of Context-Aware Web Services.
£125.99
John Wiley & Sons Inc Unsupervised Learning
Book SynopsisA new approach to unsupervised learning Evolving technologies have brought about an explosion of information in recent years, but the question of how such information might be effectively harvested, archived, and analyzed remains a monumental challengefor the processing of such information is often fraught with the need for conceptual interpretation: a relatively simple task for humans, yet an arduous one for computers. Inspired by the relative success of existing popular research on self-organizing neural networks for data clustering and feature extraction, Unsupervised Learning: A Dynamic Approach presents information within the family of generative, self-organizing maps, such as the self-organizing tree map (SOTM) and the more advanced self-organizing hierarchical variance map (SOHVM). It covers a series of pertinent, real-world applications with regard to the processing of multimedia datafrom its role in generic image processing techniques, such as thTable of ContentsAcknowledgments xi 1 Introduction 1 1.1 Part I: The Self-Organizing Method 1 1.2 Part II: Dynamic Self-Organization for Image Filtering and Multimedia Retrieval 2 1.3 Part III: Dynamic Self-Organization for Image Segmentation and Visualization 5 1.4 Future Directions 7 2 Unsupervised Learning 9 2.1 Introduction 9 2.2 Unsupervised Clustering 9 2.3 Distance Metrics for Unsupervised Clustering 11 2.4 Unsupervised Learning Approaches 13 2.4.1 Partitioning and Cluster Membership 13 2.4.2 Iterative Mean-Squared Error Approaches 15 2.4.3 Mixture Decomposition Approaches 17 2.4.4 Agglomerative Hierarchical Approaches 18 2.4.5 Graph-Theoretic Approaches 20 2.4.6 Evolutionary Approaches 20 2.4.7 Neural Network Approaches 21 2.5 Assessing Cluster Quality and Validity 21 2.5.1 Cost Function–Based Cluster Validity Indices 22 2.5.2 Density-Based Cluster Validity Indices 23 2.5.3 Geometric-Based Cluster Validity Indices 24 3 Self-Organization 27 3.1 Introduction 27 3.2 Principles of Self-Organization 27 3.2.1 Synaptic Self-Amplification and Competition 27 3.2.2 Cooperation 28 3.2.3 Knowledge Through Redundancy 29 3.3 Fundamental Architectures 29 3.3.1 Adaptive Resonance Theory 29 3.3.2 Self-Organizing Map 37 3.4 Other Fixed Architectures for Self-Organization 43 3.4.1 Neural Gas 44 3.4.2 Hierarchical Feature Map 45 3.5 Emerging Architectures for Self-Organization 46 3.5.1 Dynamic Hierarchical Architectures 47 3.5.2 Nonstationary Architectures 48 3.5.3 Hybrid Architectures 50 3.6 Conclusion 50 4 Self-Organizing Tree Map 53 4.1 Introduction 53 4.2 Architecture 54 4.3 Competitive Learning 55 4.4 Algorithm 57 4.5 Evolution 61 4.5.1 Dynamic Topology 61 4.5.2 Classification Capability 64 4.6 Practical Considerations, Extensions, and Refinements 68 4.6.1 The Hierarchical Control Function 68 4.6.2 Learning, Timing, and Convergence 71 4.6.3 Feature Normalization 73 4.6.4 Stop Criteria 73 4.7 Conclusions 74 5 Self-Organization in Impulse Noise Removal 75 5.1 Introduction 75 5.2 Review of Traditional Median-Type Filters 76 5.3 The Noise-Exclusive Adaptive Filtering 82 5.3.1 Feature Selection and Impulse Detection 82 5.3.2 Noise Removal Filters 84 5.4 Experimental Results 86 5.5 Detection-Guided Restoration and Real-Time Processing 99 5.5.1 Introduction 99 5.5.2 Iterative Filtering 101 5.5.3 Recursive Filtering 104 5.5.4 Real-Time Processing of Impulse Corrupted TV Pictures 105 5.5.5 Analysis of the Processing Time 109 5.6 Conclusions 115 6 Self-Organization in Image Retrieval 119 6.1 Retrieval of Visual Information 120 6.2 Visual Feature Descriptor 122 6.2.1 Color Histogram and Color Moment Descriptors 122 6.2.2 Wavelet Moment and Gabor Texture Descriptors 123 6.2.3 Fourier and Moment-based Shape Descriptors 125 6.2.4 Feature Normalization and Selection 127 6.3 User-Assisted Retrieval 130 6.3.1 Radial Basis Function Method 132 6.4 Self-Organization for Pseudo Relevance Feedback 136 6.5 Directed Self-Organization 140 6.5.1 Algorithm 142 6.6 Optimizing Self-Organization for Retrieval 146 6.6.1 Genetic Principles 147 6.6.2 System Architecture 149 6.6.3 Genetic Algorithm for Feature Weight Detection 150 6.7 Retrieval Performance 153 6.7.1 Directed Self-Organization 153 6.7.2 Genetic Algorithm Weight Detection 155 6.8 Summary 157 7 The Self-Organizing Hierarchical Variance Map 159 7.1 An Intuitive Basis 160 7.2 Model Formulation and Breakdown 162 7.2.1 Topology Extraction via Competitive Hebbian Learning 163 7.2.2 Local Variance via Hebbian Maximal Eigenfilters 165 7.2.3 Global and Local Variance Interplay for Map Growth and Termination 170 7.3 Algorithm 173 7.3.1 Initialization, Continuation, and Presentation 173 7.3.2 Updating Network Parameters 175 7.3.3 Vigilance Evaluation and Map Growth 175 7.3.4 Topology Adaptation 176 7.3.5 Node Adaptation 177 7.3.6 Optional Tuning Stage 177 7.4 Simulations and Evaluation 177 7.4.1 Observations of Evolution and Partitioning 178 7.4.2 Visual Comparisons with Popular Mean-Squared Error Architectures 181 7.4.3 Visual Comparison Against Growing Neural Gas 183 7.4.4 Comparing Hierarchical with Tree-Based Methods 183 7.5 Tests on Self-Determination and the Optional Tuning Stage 187 7.6 Cluster Validity Analysis on Synthetic and UCI Data 187 7.6.1 Performance vs. Popular Clustering Methods 190 7.6.2 IRIS Dataset 192 7.6.3 WINE Dataset 195 7.7 Summary 195 8 Microbiological Image Analysis Using Self-Organization 197 8.1 Image Analysis in the Biosciences 197 8.1.1 Segmentation: The Common Denominator 198 8.1.2 Semi-supervised versus Unsupervised Analysis 199 8.1.3 Confocal Microscopy and Its Modalities 200 8.2 Image Analysis Tasks Considered 202 8.2.1 Visualising Chromosomes During Mitosis 202 8.2.2 Segmenting Heterogeneous Biofilms 204 8.3 Microbiological Image Segmentation 205 8.3.1 Effects of Feature Space Definition 207 8.3.2 Fixed Weighting of Feature Space 209 8.3.3 Dynamic Feature Fusion During Learning 213 8.4 Image Segmentation Using Hierarchical Self-Organization 215 8.4.1 Gray-Level Segmentation of Chromosomes 215 8.4.2 Automated Multilevel Thresholding of Biofilm 220 8.4.3 Multidimensional Feature Segmentation 221 8.5 Harvesting Topologies to Facilitate Visualization 226 8.5.1 Topology Aware Opacity and Gray-Level Assignment 227 8.5.2 Visualization of Chromosomes During Mitosis 228 8.6 Summary 233 9 Closing Remarks and Future Directions 237 9.1 Summary of Main Findings 237 9.1.1 Dynamic Self-Organization: Effective Models for Efficient Feature Space Parsing 237 9.1.2 Improved Stability, Integrity, and Efficiency 238 9.1.3 Adaptive Topologies Promote Consistency and Uncover Relationships 239 9.1.4 Online Selection of Class Number 239 9.1.5 Topologies Represent a Useful Backbone for Visualization or Analysis 240 9.2 Future Directions 240 9.2.1 Dynamic Navigation for Information Repositories 241 9.2.2 Interactive Knowledge-Assisted Visualization 243 9.2.3 Temporal Data Analysis Using Trajectories 245 Appendix A 249 A.1 Global and Local Consistency Error 249 References 251 Index 269
£100.76
John Wiley & Sons Inc Making Sense of Data III
Book SynopsisAs third in the series, this book focuses on a style of data analysis that makes graphics central to exploration. Making Sense of Data III explains how to implement decision support systems and provides an interactive approach to data analysis that allows users to see, manipulate, explore, mine data, and share results with colleagues.Trade Review“It is an essential book for understanding the principal role that graphics play in data visualization.” (Zentralblatt MATH, 1 April 2015) Table of ContentsPreface. 1. Introduction. 1.1 Overview. 1.2 Visual Perception. 1.3 Visualization. 1.4 Designing for High-throughput Data Exploration. 1.5 Summary. 1.6 Further reading. 2. The Cognitive and Visual Systems. 2.1 External Representation. 2.2 The Cognitive System. 2.3 Visual Perception. 2.4 Influencing Visual Perception. 2.5 Summary. 2.6 Further reading. 3. Graphic Representations. 3.1 Jacques Bertin: Semiology of Graphics. 3.2 Wilkinson: Grammar of Graphics. 3.3 Wickham: ggplot2. 3.4 Bostock and Heer: Protovis. 3.5 Summary. 3.6 Further reading. 4. Designing Visual Interactions. 4.1 Designing for Complexity. 4.2 The Process of Design. 4.3 Visual Interaction Design. 5. Hands-on: Creating Interactive Visualizations with Protovis. 5.1 Using Protovis. 5.2 Creating Code using the Protovis Graphical Framework. 5.3 Basic Protovis Marks. 5.4 Creating Customized Plots. 5.5 Creating Basic Plots. 5.6 Data Analysis Graphs. 5.7 Composite Plots. 5.8 Interactive Plots. 5.9 Protovis Summary. 5.10 Further Reading. Appendix. A Exercise Code Examples. Bibliography. Index.
£81.86
John Wiley & Sons Inc Data Mining Techniques
Book SynopsisThe leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business.Table of ContentsIntroduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 What Is Data Mining? 2 Data Mining Is a Business Process 2 Large Amounts of Data 3 Meaningful Patterns and Rules 3 Data Mining and Customer Relationship Management 4 Why Now? 6 Data Is Being Produced 6 Data Is Being Warehoused 6 Computing Power Is Affordable 7 Interest in Customer Relationship Management Is Strong 7 Commercial Data Mining Software Products Have Become Available 8 Skills for the Data Miner 9 The Virtuous Cycle of Data Mining 9 A Case Study in Business Data Mining 11 Identifying BofA’s Business Challenge 12 Applying Data Mining 12 Acting on the Results 13 Measuring the Effects of Data Mining 14 Steps of the Virtuous Cycle 15 Identify Business Opportunities 16 Transform Data into Information 17 Act on the Information 19 Measure the Results 20 Data Mining in the Context of the Virtuous Cycle 23 Lessons Learned 26 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Two Customer Lifecycles 27 The Customer’s Lifecycle 28 The Customer Lifecycle 28 Subscription Relationships versus Event-Based Relationships 30 Organize Business Processes Around the Customer Lifecycle 32 Customer Acquisition 33 Customer Activation 36 Customer Relationship Management 37 Winback 38 Data Mining Applications for Customer Acquisition 38 Identifying Good Prospects 39 Choosing a Communication Channel 39 Picking Appropriate Messages 40 A Data Mining Example: Choosing the Right Place to Advertise 40 Who Fits the Profile? 41 Measuring Fitness for Groups of Readers 44 Data Mining to Improve Direct Marketing Campaigns 45 Response Modeling 46 Optimizing Response for a Fixed Budget 47 Optimizing Campaign Profitability 49 Reaching the People Most Influenced by the Message 53 Using Current Customers to Learn About Prospects 54 Start Tracking Customers Before They Become “Customers” 55 Gather Information from New Customers 55 Acquisition-Time Variables Can Predict Future Outcomes 56 Data Mining Applications for Customer Relationship Management 56 Matching Campaigns to Customers 56 Reducing Exposure to Credit Risk 58 Determining Customer Value 59 Cross-selling, Up-selling, and Making Recommendations 60 Retention 60 Recognizing Attrition 60 Why Attrition Matters 61 Different Kinds of Attrition 62 Different Kinds of Attrition Model 63 Beyond the Customer Lifecycle 64 Lessons Learned 65 Chapter 3 The Data Mining Process 67 What Can Go Wrong? 68 Learning Things That Aren’t True 68 Learning Things That Are True, but Not Useful 73 Data Mining Styles 74 Hypothesis Testing 75 Directed Data Mining 81 Undirected Data Mining 81 Goals, Tasks, and Techniques 82 Data Mining Business Goals 82 Data Mining Tasks 83 Data Mining Techniques 88 Formulating Data Mining Problems: From Goals to Tasks to Techniques 88 What Techniques for Which Tasks? 95 Is There a Target or Targets? 96 What Is the Target Data Like? 96 What Is the Input Data Like? 96 How Important Is Ease of Use? 97 How Important Is Model Explicability? 97 Lessons Learned 98 Chapter 4 Statistics 101: What You Should Know About Data 101 Occam’s Razor 103 Skepticism and Simpson’s Paradox 103 The Null Hypothesis 104 P-Values 105 Looking At and Measuring Data 106 Categorical Values 106 Numeric Variables 117 A Couple More Statistical Ideas 120 Measuring Response 120 Standard Error of a Proportion 121 Comparing Results Using Confidence Bounds 123 Comparing Results Using Difference of Proportions 124 Size of Sample 125 What the Confidence Interval Really Means 126 Size of Test and Control for an Experiment 127 Multiple Comparisons 129 The Confidence Level with Multiple Comparisons 129 Bonferroni’s Correction 129 Chi-Square Test 130 Expected Values 130 Chi-Square Value 132 Comparison of Chi-Square to Difference of Proportions 134 An Example: Chi-Square for Regions and Starts 134 Case Study: Comparing Two Recommendation Systems with an A/B Test 138 First Metric: Participating Sessions 140 Data Mining and Statistics 144 Lessons Learned 148 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Directed Data Mining Models 152 Defining the Model Structure and Target 152 Incremental Response Modeling 154 Model Stability 156 Time-Frames in the Model Set 157 Directed Data Mining Methodology 159 Step 1: Translate the Business Problem into a Data Mining Problem 161 How Will Results Be Used? 163 How Will Results Be Delivered? 163 The Role of Domain Experts and Information Technology 164 Step 2: Select Appropriate Data 165 What Data Is Available? 166 How Much Data Is Enough? 167 How Much History Is Required? 167 How Many Variables? 168 What Must the Data Contain? 168 Step 3: Get to Know the Data 169 Examine Distributions 169 Compare Values with Descriptions 170 Validate Assumptions 170 Ask Lots of Questions 171 Step 4: Create a Model Set 172 Assembling Customer Signatures 172 Creating a Balanced Sample 172 Including Multiple Timeframes 174 Creating a Model Set for Prediction 174 Creating a Model Set for Profiling 176 Partitioning the Model Set 176 Step 5: Fix Problems with the Data 177 Categorical Variables with Too Many Values 177 Numeric Variables with Skewed Distributions and Outliers 178 Missing Values 178 Values with Meanings That Change over Time 179 Inconsistent Data Encoding 179 Step 6: Transform Data to Bring Information to the Surface 180 Step 7: Build Models 180 Step 8: Assess Models 180 Assessing Binary Response Models and Classifiers 181 Assessing Binary Response Models Using Lift 182 Assessing Binary Response Model Scores Using Lift Charts 184 Assessing Binary Response Model Scores Using Profitability Models 185 Assessing Binary Response Models Using ROC Charts 186 Assessing Estimators 188 Assessing Estimators Using Score Rankings 189 Step 9: Deploy Models 190 Practical Issues in Deploying Models 190 Optimizing Models for Deployment 191 Step 10: Assess Results 191 Step 11: Begin Again 193 Lessons Learned 193 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Similarity Models 196 Similarity and Distance 196 Example: A Similarity Model for Product Penetration 197 Table Lookup Models 203 Choosing Dimensions 204 Partitioning the Dimensions 205 From Training Data to Scores 205 Handling Sparse and Missing Data by Removing Dimensions 205 RFM: A Widely Used Lookup Model 206 RFM Cell Migration 207 RFM and the Test-and-Measure Methodology 208 RFM and Incremental Response Modeling 209 Naïve Bayesian Models 210 Some Ideas from Probability 210 The Naïve Bayesian Calculation 212 Comparison with Table Lookup Models 213 Linear Regression 213 The Best-fit Line 215 Goodness of Fit 217 Multiple Regression 220 The Equation 220 The Range of the Target Variable 221 Interpreting Coefficients of Linear Regression Equations 221 Capturing Local Effects with Linear Regression 223 Additional Considerations with Multiple Regression 224 Variable Selection for Multiple Regression 225 Logistic Regression 227 Modeling Binary Outcomes 227 The Logistic Function 229 Fixed Effects and Hierarchical Effects 231 Hierarchical Effects 232 Within and Between Effects 232 Fixed Effects 233 Lessons Learned 234 Chapter 7 Decision Trees 237 What Is a Decision Tree and How Is It Used? 238 A Typical Decision Tree 238 Using the Tree to Learn About Churn 240 Using the Tree to Learn About Data and Select Variables 241 Using the Tree to Produce Rankings 243 Using the Tree to Estimate Class Probabilities 243 Using the Tree to Classify Records 244 Using the Tree to Estimate Numeric Values 244 Decision Trees Are Local Models 245 Growing Decision Trees 247 Finding the Initial Split 248 Growing the Full Tree 251 Finding the Best Split 252 Gini (Population Diversity) as a Splitting Criterion 253 Entropy Reduction or Information Gain as a Splitting Criterion 254 Information Gain Ratio 256 Chi-Square Test as a Splitting Criterion 256 Incremental Response as a Splitting Criterion 258 Reduction in Variance as a Splitting Criterion for Numeric Targets 259 F Test 262 Pruning 262 The CART Pruning Algorithm 263 Pessimistic Pruning: The C5.0 Pruning Algorithm 267 Stability-Based Pruning 268 Extracting Rules from Trees 269 Decision Tree Variations 270 Multiway Splits 270 Splitting on More Than One Field at a Time 271 Creating Nonrectangular Boxes 271 Assessing the Quality of a Decision Tree 275 When Are Decision Trees Appropriate? 276 Case Study: Process Control in a Coffee Roasting Plant 277 Goals for the Simulator 277 Building a Roaster Simulation 278 Evaluation of the Roaster Simulation 278 Lessons Learned 279 Chapter 8 Artificial Neural Networks 281 A Bit of History 282 The Biological Model 283 The Biological Neuron 285 The Biological Input Layer 286 The Biological Output Layer 287 Neural Networks and Artificial Intelligence 287 Artificial Neural Networks 288 The Artificial Neuron 288 The Multi-Layer Perceptron 291 A Network Example 292 Network Topologies 293 A Sample Application: Real Estate Appraisal 295 Training Neural Networks 299 How Does a Neural Network Learn Using Back Propagation? 299 Pruning a Neural Network 300 Radial Basis Function Networks 303 Overview of RBF Networks 303 Choosing the Locations of the Radial Basis Functions 305 Universal Approximators 305 Neural Networks in Practice 308 Choosing the Training Set 309 Coverage of Values for All Features 309 Number of Features 310 Size of Training Set 310 Number and Range of Outputs 310 Rules of Thumb for Using MLPs 310 Preparing the Data 311 Interpreting the Output from a Neural Network 313 Neural Networks for Time Series 315 Time Series Modeling 315 A Neural Network Time Series Example 316 Can Neural Network Models Be Explained? 317 Sensitivity Analysis 318 Using Rules to Describe the Scores 318 Lessons Learned 319 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Memory-Based Reasoning 322 Look-Alike Models 323 Example: Using MBR to Estimate Rents in Tuxedo, New York 324 Challenges of MBR 327 Choosing a Balanced Set of Historical Records 328 Representing the Training Data 328 Determining the Distance Function, Combination Function, and Number of Neighbors 331 Case Study: Using MBR for Classifying Anomalies in Mammograms 331 The Business Problem: Identifying Abnormal Mammograms 332 Applying MBR to the Problem 332 The Total Solution 334 Measuring Distance and Similarity 335 What Is a Distance Function? 335 Building a Distance Function One Field at a Time 337 Distance Functions for Other Data Types 340 When a Distance Metric Already Exists 341 The Combination Function: Asking the Neighbors for Advice 342 The Simplest Approach: One Neighbor 342 The Basic Approach for Categorical Targets: Democracy 342 Weighted Voting for Categorical Targets 344 Numeric Targets 344 Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345 Why This Feat Is Challenging 346 The Audio Signature 347 Measuring Similarity 348 Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351 Building Profiles 352 Comparing Profiles 352 Making Predictions 353 Lessons Learned 354 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Customer Survival 360 What Survival Curves Reveal 360 Finding the Average Tenure from a Survival Curve 362 Customer Retention Using Survival 364 Looking at Survival as Decay 365 Hazard Probabilities 367 The Basic Idea 368 Examples of Hazard Functions 369 Censoring 371 The Hazard Calculation 372 Other Types of Censoring 375 From Hazards to Survival 376 Retention 376 Survival 378 Comparison of Retention and Survival 378 Proportional Hazards 380 Examples of Proportional Hazards 381 Stratification: Measuring Initial Effects on Survival 382 Cox Proportional Hazards 382 Survival Analysis in Practice 385 Handling Different Types of Attrition 385 When Will a Customer Come Back? 387 Understanding Customer Value 389 Forecasting 392 Hazards Changing over Time 393 Lessons Learned 394 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Optimization 398 What Is an Optimization Problem? 398 An Optimization Problem in Ant World 399 E Pluribus Unum 400 A Smarter Ant 401 Genetic Algorithms 403 A Bit of History 404 Genetics on Computers 404 Representing the Genome 413 Schemata: The Building Blocks of Genetic Algorithms 414 Beyond the Simple Algorithm 417 The Traveling Salesman Problem 418 Exhaustive Search 419 A Simple Greedy Algorithm 419 The Genetic Algorithms Approach 419 The Swarm Intelligence Approach 420 Case Study: Using Genetic Algorithms for Resource Optimization 421 Case Study: Evolving a Solution for Classifying Complaints 423 Business Context 424 Data 425 The Comment Signature 425 The Genomes 426 The Fitness Function 427 The Results 427 Lessons Learned 427 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Undirected Techniques, Undirected Data Mining 431 Undirected versus Directed Techniques 431 Undirected versus Directed Data Mining 431 Case Study: Undirected Data Mining Using Directed Techniques 432 What is Undirected Data Mining? 435 Data Exploration 435 Segmentation and Clustering 436 Target Variable Definition, When the Target Is Not Explicit 438 Simulation, Forecasting, and Agent-Based Modeling 443 Methodology for Undirected Data Mining 455 There Is No Methodology 456 Things to Keep in Mind 456 Lessons Learned 457 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Searching for Islands of Simplicity 461 Customer Segmentation and Clustering 461 Similarity Clusters 463 Tracking Campaigns by Cluster-Based Segments 464 Clustering Reveals an Overlooked Market Segment 466 Fitting the Troops 467 The K-Means Clustering Algorithm 468 Two Steps of the K-Means Algorithm 468 Voronoi Diagrams and K-Means Clusters 471 Choosing the Cluster Seeds 473Choosing K 473 Using K-Means to Detect Outliers 474 Semi-Directed Clustering 475 Interpreting Clusters 475 Characterizing Clusters by Their Centroids 476 Characterizing Clusters by What Differentiates Them 477 Using Decision Trees to Describe Clusters 478 Evaluating Clusters 479 Cluster Measurements and Terminology 480 Cluster Silhouettes 480 Limiting Cluster Diameter for Scoring 483 Case Study: Clustering Towns 484 Creating Town Signatures 484 Creating Clusters 486 Determining the Right Number of Clusters 486 Evaluating the Clusters 487 Using Demographic Clusters to Adjust Zone Boundaries 488 Business Success 490 Variations on K-Means 490 K-Medians, K-Medoids, and K-Modes 490 The Soft Side of K-Means 494 Data Preparation for Clustering 495 Scaling for Consistency 496 Use Weights to Encode Outside Information 496 Selecting Variables for Clustering 497 Lessons Learned 497 Chapter 14 Alternative Approaches to Cluster Detection 499 Shortcomings of K-Means 500 Reasonableness 500 An Intuitive Example 501 Fixing the Problem by Changing the Scales 503 What This Means in Practice 504 Gaussian Mixture Models 505 Adding “Gaussians” to K-Means 505 Back to Gaussian Mixture Models 508 Scoring GMMs 510 Applying GMMs 511 Divisive Clustering 513 A Decision Tree–Like Method for Clustering 513 Scoring Divisive Clusters 515 Clusters and Trees 515 Agglomerative (Hierarchical) Clustering 516 Overview of Agglomerative Clustering Methods 516 Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520 Scoring Agglomerative Clusters 522 Limitations of Agglomerative Clustering 523 Agglomerative Clustering in Practice 525 Combining Agglomerative Clustering and K-Means 526 Self-Organizing Maps 527 What Is a Self-Organizing Map? 527 Training an SOM 530 Scoring an SOM 531 The Search Continues for Islands of Simplicity 532 Lessons Learned 533 Chapter 15 Market Basket Analysis and Association Rules 535 Defining Market Basket Analysis 536 Four Levels of Market Basket Data 537 The Foundation of Market Basket Analysis: Basic Measures 539 Order Characteristics 540 Item (Product) Popularity 541 Tracking Marketing Interventions 542 Case Study: Spanish or English 543 The Business Problem 543 The Data 544 Defining “Hispanicity” Preference 545 The Solution 546 Association Analysis 547 Rules Are Not Always Useful 548 Item Sets to Association Rules 551 How Good Is an Association Rule? 553 Building Association Rules 555 Choosing the Right Set of Items 556 Anonymous Versus Identified 561 Generating Rules from All This Data 561 Overcoming Practical Limits 565 The Problem of Big Data 567 Extending the Ideas 569 Different Items on the Right- and Left-Hand Sides 569 Using Association Rules to Compare Stores 570 Association Rules and Cross-Selling 572 A Typical Cross-Sell Model 572 A More Confident Approach to Product Propensities 573 Results from Using Confidence 574 Sequential Pattern Analysis 574 Finding the Sequences 575 Sequential Association Rules 578 Sequential Analysis Using Other Data Mining Techniques 579 Lessons Learned 579 Chapter 16 Link Analysis 581 Basic Graph Theory 582 What Is a Graph? 582 Directed Graphs 584 Weighted Graphs 585 Seven Bridges of Königsberg 585 Detecting Cycles in a Graph 588 The Traveling Salesman Problem Revisited 589 Social Network Analysis 593 Six Degrees of Separation 593 What Your Friends Say About You 595 Finding Childcare Benefits Fraud 596 Who Responds to Whom on Dating Sites 597 Social Marketing 598 Mining Call Graphs 598 Case Study: Tracking Down the Leader of the Pack 601 The Business Goal 601 The Data Processing Challenge 601 Finding Social Networks in Call Data 602 How the Results Are Used for Marketing 602 Estimating Customer Age 603 Case Study: Who Is Using Fax Machines from Home? 604 Why Finding Fax Machines Is Useful 604 How Do Fax Machines Behave? 604 A Graph Coloring Algorithm 605 “Coloring” the Graph to Identify Fax Machines 606 How Google Came to Rule the World 607 Hubs and Authorities 608 The Details 609 Hubs and Authorities in Practice 611 Lessons Learned 612 Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613 The Architecture of Data 615 Transaction Data, the Base Level 616 Operational Summary Data 617 Decision-Support Summary Data 617 Database Schema/Data Models 618 Metadata 623 Business Rules 623 A General Architecture for Data Warehousing 624 Source Systems 624 Extraction, Transformation, and Load 626 Central Repository 627 Metadata Repository 630 Data Marts 630 Operational Feedback 631 Users and Desktop Tools 631 Analytic Sandboxes 633 Why Are Analytic Sandboxes Needed? 634 Technology to Support Analytic Sandboxes 636 Where Does OLAP Fit In? 639 What’s in a Cube? 641 Star Schema 646 OLAP and Data Mining 648 Where Data Mining Fits in with Data Warehousing 650 Lots of Data 651 Consistent, Clean Data 651 Hypothesis Testing and Measurement 652 Scalable Hardware and RDBMS Support 653 Lessons Learned 653 Chapter 18 Building Customer Signatures 655 Finding Customers in Data 656 What Is a Customer? 657 Accounts? Customers? Households? 658 Anonymous Transactions 658 Transactions Linked to a Card 659 Transactions Linked to a Cookie 659 Transactions Linked to an Account 660 Transactions Linked to a Customer 661 Designing Signatures 661 Is a Customer Signature Necessary? 666 What Does a Row Represent? 666 Will the Signature Be Used for Predictive Modeling? 671 Has a Target Been Defined? 672 Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672 Which Customers Will Be Included? 673 What Might Be Interesting to Know About Customers? 673 What a Signature Looks Like 674 Process for Creating Signatures 677 Some Data Is Already at the Right Level of Granularity 678 Pivoting a Regular Time Series 679 Aggregating Time-Stamped Transactions 680 Dealing with Missing Values 685 Missing Values in Source Data 685 Unknown or Non-Existent? 687 What Not to Do 687 Things to Consider 689 Lessons Learned 691 Chapter 19 Derived Variables: Making the Data Mean More 693 Handset Churn Rate as a Predictor of Churn 694 Single-Variable Transformations 696 Standardizing Numeric Variables 696 Turning Numeric Values into Percentiles 697 Turning Counts into Rates 698 Relative Measures 699 Replacing Categorical Variables with Numeric Ones 700 Combining Variables 707 Classic Combinations 707 Combining Highly Correlated Variables 710 Rent to Home Value 712 Extracting Features from Time Series 718 Trend 719 Seasonality 721 Extracting Features from Geography 722 Geocoding 722 Mapping 723 Using Geography to Create Relative Measures 724 Using Past Values of the Target Variable 725 Using Model Scores as Inputs 725 Handling Sparse Data 726 Account Set Patterns 726 Binning Sparse Values 727 Capturing Customer Behavior from Transactions 727 Widening Narrow Data 728 Sphere of Influence as a Predictor of Good Customers 728 An Example: Ratings to Rater Profile 730 Sample Fields from the Rater Signature 730 The Rating Signature and Derived Variables 732 Lessons Learned 733 Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735 Problems with Too Many Variables 736 Risk of Correlation Among Input Variables 736 Risk of Overfitting 738 The Sparse Data Problem 738 Visualizing Sparseness 739 Independence 740 Exhaustive Feature Selection 743 Flavors of Variable Reduction Techniques 744 Using the Target 744 Original versus New Variables 744 Sequential Selection of Features 745 The Traditional Forward Selection Methodology 745 Forward Selection Using a Validation Set 747 Stepwise Selection 748 Forward Selection Using Non-Regression Techniques 748 Backward Selection 748 Undirected Forward Selection 749 Other Directed Variable Selection Methods 749 Using Decision Trees to Select Variables 750 Variable Reduction Using Neural Networks 752 Principal Components 753 What Are Principal Components? 753 Principal Components Example 758 Principal Component Analysis 763 Factor Analysis 767 Variable Clustering 768 Example of Variable Clusters 768 Using Variable Clusters 770 Hierarchical Variable Clustering 770 Divisive Variable Clustering 773 Lessons Learned 774 Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775 What Is Text Mining? 776 Text Mining for Derived Columns 776 Beyond Derived Features 777 Text Analysis Applications 778 Working with Text Data 781 Sources of Text 781 Language Effects 782 Basic Approaches to Representing Documents 783 Representing Documents in Practice 784 Documents and the Corpus 786 Case Study: Ad Hoc Text Mining 786 The Boycott 787 Business as Usual 787 Combining Text Mining and Hypothesis Testing 787 The Results 788 Classifying News Stories Using MBR 789 What Are the Codes? 789 Applying MBR 790 The Results 793 From Text to Numbers 794 Starting with a “Bag of Words” 794 Term-Document Matrix 796 Corpus Effects 797 Singular Value Decomposition (SVD) 798 Text Mining and Naïve Bayesian Models 800 Naïve Bayesian in the Text World 801 Identifying Spam Using Naïve Bayesian 801 Sentiment Analysis 806 DIRECTV: A Case Study in Customer Service 809 Background 809 Applying Text Mining 811 Taking the Technical Approach 814 Not an Iterative Process 818 Continuing to Benefit 818 Lessons Learned 819 Index 821
£37.05
John Wiley & Sons Inc Graphical Models
Book SynopsisGraphical models are of increasing importance in applied statistics, and in particular in data mining. Providing a self-contained introduction and overview to learning relational, probabilistic, and possibilistic networks from data, this second edition of Graphical Models is thoroughly updated to include the latest research in this burgeoning field, including a new chapter on visualization. The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.Trade Review“The text provides graduate students, and researchers with all the necessary background material, including modelling under uncertainty, decomposition of distributions, graphical representation of distributions, and applications relating to graphical models and problems for further research.” (Zentralblatt Math, 1 August 2013) "All of the necessary background is provided, with material on modeling under uncertainty and imprecision modeling, decomposition of distributions, graphical representation of distributions, applications relating to graphical models, and problems for further research." (Book News, December 2009)Table of ContentsPreface. 1 Introduction. 1.1 Data and Knowledge. 1.2 Knowledge Discovery and Data Mining. 1.3 Graphical Models. 1.4 Outline of this Book. 2 Imprecision and Uncertainty. 2.1 Modeling Inferences. 2.2 Imprecision and Relational Algebra. 2.3 Uncertainty and Probability Theory. 2.4 Possibility Theory and the Context Model. 3 Decomposition. 3.1 Decomposition and Reasoning. 3.2 Relational Decomposition. 3.3 Probabilistic Decomposition. 3.4 Possibilistic Decomposition. 3.5 Possibility versus Probability. 4 Graphical Representation. 4.1 Conditional Independence Graphs. 4.2 Evidence Propagation in Graphs. 5 Computing Projections. 5.1 Databases of Sample Cases. 5.2 Relational and Sum Projections. 5.3 Expectation Maximization. 5.4 Maximum Projections. 6 Naive Classifiers. 6.1 Naive Bayes Classifiers. 6.2 A Naive Possibilistic Classifier. 6.3 Classifier Simplification. 6.4 Experimental Evaluation. 7 Learning Global Structure. 7.1 Principles of Learning Global Structure. 7.2 Evaluation Measures. 7.3 Search Methods. 7.4 Experimental Evaluation. 8 Learning Local Structure. 8.1 Local Network Structure. 8.2 Learning Local Structure. 8.3 Experimental Evaluation. 9 Inductive Causation. 9.1 Correlation and Causation. 9.2 Causal and Probabilistic Structure. 9.3 Faithfulness and Latent Variables. 9.4 The Inductive Causation Algorithm. 9.5 Critique of the Underlying Assumptions. 9.6 Evaluation. 10 Visualization. 10.1 Potentials. 10.2 Association Rules. 11 Applications. 11.1 Diagnosis of Electrical Circuits. 11.2 Application in Telecommunications. 11.3 Application at Volkswagen. 11.4 Application at DaimlerChrysler. A Proofs of Theorems. A.1 Proof of Theorem 4.1.2. A.2 Proof of Theorem 4.1.18. A.3 Proof of Theorem 4.1.20. A.4 Proof of Theorem 4.1.26. A.5 Proof of Theorem 4.1.28. A.6 Proof of Theorem 4.1.30. A.7 Proof of Theorem 4.1.31. A.8 Proof of Theorem 5.4.8. A.9 Proof of Lemma .2.2. A.10 Proof of Lemma .2.4. A.11 Proof of Lemma .2.6. A.12 Proof of Theorem 7.3.1. A.13 Proof of Theorem 7.3.2. A.14 Proof of Theorem 7.3.3. A.15 Proof of Theorem 7.3.5. A.16 Proof of Theorem 7.3.7. B Software Tools. Bibliography. Index.
£88.16
John Wiley & Sons Inc Data Mining Techniques in CRM Inside Customer
Book SynopsisThis is an applied handbook for the application of data mining techniques in the CRM framework. It combines a technical and a business perspective to cover the needs of business users who are looking for a practical guide on data mining.Trade Review"The book is written in a language that is easily accessible to business users who are not fluent in statistical methods and who have no prior exposure to the data mining or customer segmentation domain . . . This book is poised to become a standard reference, and I unconditionally recommend it to anyone working in this field." (Computing Reviews, 23 June 2011) "This is an excellent book for any data miner or anybody involved in CRM. The text is clear and pictures are well done and funny which is rare enough to be mentioned. From basic to advanced topics, the book is a very pleasant journey inside data mining with a clear focus on customer segmentation. Really advised if you're not a fan of formulas." (Data Mining Research, 18 March 2011)Table of ContentsAcknowledgements. 1. Data Mining in CRM. The CRM Strategy. What Can Data Mining Do? The Data Mining Methodology. Data Mining and Business Domain Expertise. Summary. 2. An Overview of Data Mining Techniques. Supervised Modeling. Unsupervised Modeling Techniques. Machine Learning/Artificial Intelligence vs. Statistical Techniques. Summary. 3. Data Mining Techniques for Segmentation. Segmenting Customers with Data Mining Techniques. Principal Components Analysis. Clustering Techniques. Examining and Evaluating the Cluster Solution. Understanding the Clusters through Profiling. Selecting the Optimal Cluster Solution. Cluster Profiling and Scoring with Supervised Models. An Introduction to Decision Tree Models. Summary. 4. The Mining Data Mart. Designing the Mining Data Mart. The Time Frame Covered by the Mining Data Mart. The Mining Data Mart for Retail Banking. The Mining Data Mart for Mobile Telephony Consumer (Residential) Customers. The Mining Data Mart for Retailers. Summary. 5. Customer Segmentation. An Introduction to Customer Segmentation. Segmentation Types in Consumer Markets. Segmentation in Business Markets. A Guide for Behavioral Segmentation. Segmentation Management Strategy. A Guide for Value-Based Segmentation. Designing Differentiated Strategies for the Value Segments. Summary. 6. Segmentation Applications in Banking. Segmentation for Credit Card Holders. Segmentation in Retail Banking. The Marketing Process. Segmentation in Retail Banking; A Summary. 7. Segmentation Applications in Telecommunications. Mobile Telephony. The Fixed Telephony Case. Summary. 8. Segmentation for Retailers. Segmentation in the Retail Industry. The RFM Analysis. Grouping Customers According to the Products They Buy. Summary. Further Reading. Index.
£61.16
University of California Press Data Mining for the Social Sciences An
Book SynopsisWe live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Providing an introduction to data mining, the authors discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists.
£100.00
University of California Press Data Mining for the Social Sciences
Book SynopsisWe live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Providing an introduction to data mining, the authors discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists.Table of ContentsPART 1. CONCEPTS 1. What Is Data Mining? 2. Contrasts with the Conventional Statistical Approach 3. Some General Strategies Used in Data Mining 4. Important Stages in a Data Mining Project PART 2. WORKED EXAMPLES 5. Preparing Training and Test Datasets 6. Variable Selection Tools 7. Creating New Variables Using Binning and Trees 8. Extracting Variables 9. Classifiers 10. Classification Trees 11. Neural Networks 12. Clustering 13. Latent Class Analysis and Mixture Models 14. Association Rules Conclusion Bibliography Notes Index
£28.90
Cambridge University Press The Text Mining Handbook
Book SynopsisPresents a comprehensive discussion of the state-of-the-art in text mining and link detection. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, the book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches, ending with real-world, mission-critical applications.Trade Review' … buy the book. This book is definitely worth having in your book shelf as a handy reference.' IAPR NewsletterTable of Contents1. Introduction to text mining; 2. Core text mining operations; 3. Text mining preprocessing techniques; 4. Categorization; 5. Clustering; 6. Information extraction; 7. Probabilistic models for Information extraction; 8. Preprocessing applications using probabilistic and hybrid approaches; 9. Presentation-layer considerations for browsing and query refinement; 10. Visualization approaches; 11. Link analysis; 12. Text mining applications; Appendix; Bibliography.
£74.99
O'Reilly Media Head First Data Analysis
Book SynopsisInterpreting data is a critical decision-making factor for businesses and organizations. This title helps you learn how to collect and organize data, sort the distractions from the truth, find meaningful patterns, draw conclusions, predict the future, and present your findings to others.
£35.99
Princeton University Press The Silicon Jungle
Book SynopsisWhat happens when a naive intern is granted unfettered access to people's most private thoughts and actions? Stephen Thorpe lands a coveted internship at Ubatoo, an Internet empire that provides its users with popular online services, from a search engine and e-mail, to social networking. When Stephen's boss asks him to work on a project with the ATrade ReviewCo-Winner of the 2012 Mary Shelley Award for Outstanding Fictional Work, Media Ecology Association "Baluja's clever, cynical debut explores the frightening possibilities of data mining... A nod to Upton Sinclair's muckraking The Jungle, which scared its readers into regulating the meat-packing industry, this lively if depressing novel suggests that computer snooping is too seductive to control, despite the consequences."--Publishers Weekly "[F]righteningly convincing... The read is quick, the questions will linger, and the ideas are so intriguing... Baluja simplifies the abstract world of tech-speak for the rest of us while aiming to do for the Internet what Upton Sinclair's The Jungle did for the meat industry: make readers reconsider its safety. For fans of intelligent thrillers."--Stephen Morrow, Library Journal "In the era of the ubiquitous web company, The Silicon Jungle provides ample food for thought."--Zena Iovino, New Scientist "[T]his cautionary tale is fascinating for its exploration of technology as a conduit for crime."--Michele Leber, Booklist Online "The book's central message is fascinating. A company like Google, Baluja points out, has far more information on U.S. citizens than does the FBI and far fewer restrictions on how to use it. It's a chilling message in a fun package."--Kathleen Offenholley, Mathematics TeacherTable of ContentsPreface xi Endings 1 Anklets 3 Anthropologists in the Midst 10 Mollycoddle 13 Touchpoints 19 Checking In 26 Working 9 to 4 28 Predicting the Future and 38 Needles 33 Contact 39 Two Geeks in a Pod 47 An Understatement 53 Euphoria and Diet Pills 61 To Better Days 70 Marathon 75 The Life and Soul of an Intern 81 Candid Cameras 85 Episodes 89 Liberal Food and Even More Liberal Activism 92 Subjects 100 Newsworthy 105 Patience 110 Hypergrowth 113 Little Pink Houses 117 Truth, Lies, and Algorithms 122 Negotiations and Herding Cats 129 The JENNY Discovery 133 I Dream of JENNY 138 A Five-Step Program: Hallucinations and Archetypes 143 Over-Deliver 150 A Life Changed in Four Phone Calls 154 Giving Thanks 160 A Drive through the Country 166 Control 171 A Tale of Two Tenures 178 Prelude to Pie 183 The Yuri Effect 188 Apple Pie 195 Thoughts Like Butterflies 201 Core-Relations 207 Collide 212 Control, Revisited 220 Fables of the Deconstruction 223 Control, Foregone 232 Foundations 236 One Way 241 Sebastin's Friends 244 A Tinker by Any Other Name 251 When It Rains 262 I Am a Heartbeat 267 What I Did This Summer 273 A Permanent Position 280 For Adam 284 Faith 288 Counting by Two 291 Disconnect 298 Sahim 304 Epilogue: Beginnings 309 Acknowledgments 313 Know More 315 Privacy Policy of a Few Organizations 317 References 319
£23.80
Princeton University Press Data Visualization
Book SynopsisTrade Review"[Healy’s] prose is engaging and chatty, and the style of instruction is unpretentious and practical . . . This single volume represents an excellent entry point for those wishing to upskill their abilities in data visualization."---Paul Cuffe, IEEE Transactions"Undoubtedly, this book is an excellent introduction to an essential tool for anyone who needs to collect and present data." * Conservation Biology *
£79.20
Taylor & Francis Ltd Data Analytics Applications in Gaming and
Book SynopsisThe last decade has witnessed the rise of big data in game development as the increasing proliferation of Internet-enabled gaming devices has made it easier than ever before to collect large amounts of player-related data. At the same time, the emergence of new business models and the diversification of the player base have exposed a broader potential audience, which attaches great importance to being able to tailor game experiences to a wide range of preferences and skill levels. This, in turn, has led to a growing interest in data mining techniques, as they offer new opportunities for deriving actionable insights to inform game design, to ensure customer satisfaction, to maximize revenues, and to drive technical innovation. By now, data mining and analytics have become vital components of game development. The amount of work being done in this area nowadays makes this an ideal time to put together a book on this subject.Data Analytics Applications in Gaming andTable of ContentsPart 1 – Introduction to game data mining. Part 2 – Data mining for games user research. Part 3 – Data mining for game technology.Part 4 – Visualization of large-scale game data.
£42.74
CRC Press The Art of Data Science
Book SynopsisAlthough change is constant in business and analytics, some fundamental principles and lessons learned are truly timeless, extending and surviving beyond the rapid ongoing evolution of tools, techniques, and technologies. Through a series of articles published over the course of his 30+ year career in analytics and technology, author Doug Gray shares the most important lessons he has learned â with colleagues and students as well â that have helped to ensure success on his journey as a practitioner, leader, and educator.The reader witnesses the Analytical Sciences profession through the mindâs eye of a practitioner who has operated at the forefront of analytically-inclined organizations, such as American Airlines and Walmart, delivering solutions that generate hundreds of millions of dollars annually in business value, and an educator teaching students and conducting research at a leading university. Through real-world project case studies, first-hand stories, and practical e
£46.54