Description

Book Synopsis
A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.

Table of Contents

Foreword 1

Chapter 1: What is Text Mining? 1

1.1 What is it? 1

1.1.1 What is text mining in practice? 1

1.1.2 Where does text mining fit? 1

1.2 Why we care about text mining? 1

1.2.1 What are the consequences of ignoring text? 1

1.2.2 What are the benefits of text mining? 1

1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

1.3 A basic workflow. How the process works. 1

1.4 What tools do I need to get started with this? 1

1.5 A Simple Example 1

1.6 A Real World Use Case 1

1.7 Summary 1

Chapter 2: Basics of text mining 1

2.1 What is Text Mining in a practical sense? 1

2.2 Types of Text Mining: Bag of Words. 1

2.2.1 Types of Text Mining: Syntactic Parsing. 1

2.3 The text mining process in context 1

2.4 String Manipulation: Number of Characters & Substitutions 1

2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

2.5 Keyword Scanning 1

2.6 String Packages stringr & stringi 1

2.7 Preprocessing Steps for Bag of Words Text Mining 1

2.8 Spell Check 1

2.9 Frequent Terms & Associations 1

2.9 Delta Assist Wrap Up 1

2.10 Summary 1

Chapter 3: Common Text Mining Visualizations 1

3.1 A tale of two (or three) cultures 1

3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

3.2.1 Term Frequency 1

3.2.2 Word Associations 1

3.2.3 Word Networks 1

3.3 Simple Word Clusters: Hierarchical Dendrograms 1

3.4 Word Clouds: Overused but Effective 1

3.4.1 One Corpus Word Clouds 1

3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

3.4.3 Polarized Tag Plot 1

3.5 Summary 1

Chapter 4: Sentiment Scoring 1

4.1 What is Sentiment Analysis? 1

4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

4.3 Polarity: Simple Sentiment Scoring 1

4.3.1 Subjectivity Lexicons 1

4.3.2 Qdap’s Scoring for positive and negative word choice 1

4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

4.4 Emoticons :) Dealing with these perplexing clues 1

4.4.1 Symbol-Based Emoticons Native to R 1

4.4.2 Punctuation Based Emoticons 1

4.4.3 Emoji 1

4.5 R’s Archived Sentiment Scoring Library 1

4.5 Sentiment the tidytext way 1

4.6 Airbnb.com Boston Wrap Up 1

4.7 Summary 1

Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

5.1 What is clustering? 1

5.1.1 K Means Clustering 1

5.1.2 Spherical K Means Clustering 1

5.1.3 K Mediod Clustering 1

5.1.4 Evaluating the cluster approaches 1

5.2 Calculating & Exploring String Distance 1

5.2.1 What is string distance? 1

5.2.2 Fuzzy Matching-amatch, ain 1

5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

5.3 LDA Topic Modeling Explained 1

5.3.2 Topic Modeling Case Study 1

5.3.2 LDA &LDAvis 1

5.4 Text to Vectors using “text2vec” 1

5.4.1 text2vec 1

5.5 Summary 1

Chapter 6: Document Classification: Finding Clickbait from Headlines 1

6.1 What is document classification? 1

6.2 Clickbait Case Study 1

6.2.2 Session & Data Set Up 1

6.2.3 GLMNET Training 1

6.2.4 GLMNET Test Predictions 1

6.2.5 Test Set Evaluation 1

6.2.6 Finding the most impactful words 1

6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

6.3 Summary 1

Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

7.1 Classification Vs Prediction 1

7.2 Case Study I: Will this patient come back to the hospital? 1

7.2.2 Patient Readmission in the Text Mining Workflow 1

7.2.3 Session & Data Set Up 1

7.2.4 Patient Modeling 1

7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

7.2.5.1 Additional Evaluation Metrics 1

7.2.6 Apply the model to new patients 1

7.2.7 Patient Readmission Conclusion 1

7.3 Case Study II: Predicting Box Office Success 1

7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

7.3.3 Session & Data Set Up 1

7.3.4 Opening Weekend Modeling 1

7.3.5 Model Evaluation 1

7.3.6 Apply the Model to new Movie Reviews 1

7.3.7 Movie Revenue Conclusion 1

7.4 Summary 1

Chapter 8: The OpenNLP Project 1

8.1 What is the OpenNLP project? 1

8.2 R’s OpenNLP Package 1

8.3 Named Entities in Hillary Clinton’s Email 1

8.3.1 R Session Set-up 1

8.3.2 Minor Text Cleaning 1

8.3.3 Using OpenNLP on a single email 1

8.3.4 Using OpenNLP on multiple documents 1

8.3.5 Revisiting the Text Mining Workflow 1

8.4 Analyzing the Named Entities 1

8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

8.4.2 Mapping Only European Locations 1

8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

8.4.4 Stock Charts for Entities 1

8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

8.5 Summary 1

Chapter 9: Text Sources 1

9.1 Sourcing Text 1

9.2 Web Sources 1

9.2.1 Web Scraping a Single Page with rvest 1

9.2.2 Web Scraping Multiple Pages with rvest 1

9.2.3 Application Program Interfaces (APIs) 1

9.2.4 Newspaper Articles from The Guardian Newspaper 1

9.2.5 Tweets using the “twitteR” Package 1

9.2.6 Calling an API without a dedicated R package 1

9.2.7 Using jsonlite to access the New York Times 1

9.2.8 Using RCurl & XML to Parse Google News Feeds 1

9.2.9 The tm library Web-Mining Plugin 1

9.3 Getting Text from File Sources 1

9.3.1 Individual CSV, TXT and Microsoft Office Files 1

9.3.2 Reading multiple files quickly 1

9.3.2 Extracting Text from PDFs 1

9.3.3 Optical Character Recognition: Extracting Text from Images 1

9.4 Summary 1

Text Mining in Practice with R

    Product form

    £52.20

    Includes FREE delivery

    RRP £54.95 – you save £2.75 (5%)

    Order before 4pm tomorrow for delivery by Sat 4 Jul 2026.

    A Hardback by Ted Kwartler

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of Text Mining in Practice with R by Ted Kwartler

      Publisher: John Wiley & Sons Inc
      Publication Date: 21/07/2017
      ISBN13: 9781119282013, 978-1119282013
      ISBN10: 1119282012
      Also in:
      Data mining

      Description

      Book Synopsis
      A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.

      Table of Contents

      Foreword 1

      Chapter 1: What is Text Mining? 1

      1.1 What is it? 1

      1.1.1 What is text mining in practice? 1

      1.1.2 Where does text mining fit? 1

      1.2 Why we care about text mining? 1

      1.2.1 What are the consequences of ignoring text? 1

      1.2.2 What are the benefits of text mining? 1

      1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

      1.3 A basic workflow. How the process works. 1

      1.4 What tools do I need to get started with this? 1

      1.5 A Simple Example 1

      1.6 A Real World Use Case 1

      1.7 Summary 1

      Chapter 2: Basics of text mining 1

      2.1 What is Text Mining in a practical sense? 1

      2.2 Types of Text Mining: Bag of Words. 1

      2.2.1 Types of Text Mining: Syntactic Parsing. 1

      2.3 The text mining process in context 1

      2.4 String Manipulation: Number of Characters & Substitutions 1

      2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

      2.5 Keyword Scanning 1

      2.6 String Packages stringr & stringi 1

      2.7 Preprocessing Steps for Bag of Words Text Mining 1

      2.8 Spell Check 1

      2.9 Frequent Terms & Associations 1

      2.9 Delta Assist Wrap Up 1

      2.10 Summary 1

      Chapter 3: Common Text Mining Visualizations 1

      3.1 A tale of two (or three) cultures 1

      3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

      3.2.1 Term Frequency 1

      3.2.2 Word Associations 1

      3.2.3 Word Networks 1

      3.3 Simple Word Clusters: Hierarchical Dendrograms 1

      3.4 Word Clouds: Overused but Effective 1

      3.4.1 One Corpus Word Clouds 1

      3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

      3.4.3 Polarized Tag Plot 1

      3.5 Summary 1

      Chapter 4: Sentiment Scoring 1

      4.1 What is Sentiment Analysis? 1

      4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

      4.3 Polarity: Simple Sentiment Scoring 1

      4.3.1 Subjectivity Lexicons 1

      4.3.2 Qdap’s Scoring for positive and negative word choice 1

      4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

      4.4 Emoticons :) Dealing with these perplexing clues 1

      4.4.1 Symbol-Based Emoticons Native to R 1

      4.4.2 Punctuation Based Emoticons 1

      4.4.3 Emoji 1

      4.5 R’s Archived Sentiment Scoring Library 1

      4.5 Sentiment the tidytext way 1

      4.6 Airbnb.com Boston Wrap Up 1

      4.7 Summary 1

      Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

      5.1 What is clustering? 1

      5.1.1 K Means Clustering 1

      5.1.2 Spherical K Means Clustering 1

      5.1.3 K Mediod Clustering 1

      5.1.4 Evaluating the cluster approaches 1

      5.2 Calculating & Exploring String Distance 1

      5.2.1 What is string distance? 1

      5.2.2 Fuzzy Matching-amatch, ain 1

      5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

      5.3 LDA Topic Modeling Explained 1

      5.3.2 Topic Modeling Case Study 1

      5.3.2 LDA &LDAvis 1

      5.4 Text to Vectors using “text2vec” 1

      5.4.1 text2vec 1

      5.5 Summary 1

      Chapter 6: Document Classification: Finding Clickbait from Headlines 1

      6.1 What is document classification? 1

      6.2 Clickbait Case Study 1

      6.2.2 Session & Data Set Up 1

      6.2.3 GLMNET Training 1

      6.2.4 GLMNET Test Predictions 1

      6.2.5 Test Set Evaluation 1

      6.2.6 Finding the most impactful words 1

      6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

      6.3 Summary 1

      Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

      7.1 Classification Vs Prediction 1

      7.2 Case Study I: Will this patient come back to the hospital? 1

      7.2.2 Patient Readmission in the Text Mining Workflow 1

      7.2.3 Session & Data Set Up 1

      7.2.4 Patient Modeling 1

      7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

      7.2.5.1 Additional Evaluation Metrics 1

      7.2.6 Apply the model to new patients 1

      7.2.7 Patient Readmission Conclusion 1

      7.3 Case Study II: Predicting Box Office Success 1

      7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

      7.3.3 Session & Data Set Up 1

      7.3.4 Opening Weekend Modeling 1

      7.3.5 Model Evaluation 1

      7.3.6 Apply the Model to new Movie Reviews 1

      7.3.7 Movie Revenue Conclusion 1

      7.4 Summary 1

      Chapter 8: The OpenNLP Project 1

      8.1 What is the OpenNLP project? 1

      8.2 R’s OpenNLP Package 1

      8.3 Named Entities in Hillary Clinton’s Email 1

      8.3.1 R Session Set-up 1

      8.3.2 Minor Text Cleaning 1

      8.3.3 Using OpenNLP on a single email 1

      8.3.4 Using OpenNLP on multiple documents 1

      8.3.5 Revisiting the Text Mining Workflow 1

      8.4 Analyzing the Named Entities 1

      8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

      8.4.2 Mapping Only European Locations 1

      8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

      8.4.4 Stock Charts for Entities 1

      8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

      8.5 Summary 1

      Chapter 9: Text Sources 1

      9.1 Sourcing Text 1

      9.2 Web Sources 1

      9.2.1 Web Scraping a Single Page with rvest 1

      9.2.2 Web Scraping Multiple Pages with rvest 1

      9.2.3 Application Program Interfaces (APIs) 1

      9.2.4 Newspaper Articles from The Guardian Newspaper 1

      9.2.5 Tweets using the “twitteR” Package 1

      9.2.6 Calling an API without a dedicated R package 1

      9.2.7 Using jsonlite to access the New York Times 1

      9.2.8 Using RCurl & XML to Parse Google News Feeds 1

      9.2.9 The tm library Web-Mining Plugin 1

      9.3 Getting Text from File Sources 1

      9.3.1 Individual CSV, TXT and Microsoft Office Files 1

      9.3.2 Reading multiple files quickly 1

      9.3.2 Extracting Text from PDFs 1

      9.3.3 Optical Character Recognition: Extracting Text from Images 1

      9.4 Summary 1

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account