Description

Book Synopsis
A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.

Table of Contents

Foreword 1

Chapter 1: What is Text Mining? 1

1.1 What is it? 1

1.1.1 What is text mining in practice? 1

1.1.2 Where does text mining fit? 1

1.2 Why we care about text mining? 1

1.2.1 What are the consequences of ignoring text? 1

1.2.2 What are the benefits of text mining? 1

1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

1.3 A basic workflow. How the process works. 1

1.4 What tools do I need to get started with this? 1

1.5 A Simple Example 1

1.6 A Real World Use Case 1

1.7 Summary 1

Chapter 2: Basics of text mining 1

2.1 What is Text Mining in a practical sense? 1

2.2 Types of Text Mining: Bag of Words. 1

2.2.1 Types of Text Mining: Syntactic Parsing. 1

2.3 The text mining process in context 1

2.4 String Manipulation: Number of Characters & Substitutions 1

2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

2.5 Keyword Scanning 1

2.6 String Packages stringr & stringi 1

2.7 Preprocessing Steps for Bag of Words Text Mining 1

2.8 Spell Check 1

2.9 Frequent Terms & Associations 1

2.9 Delta Assist Wrap Up 1

2.10 Summary 1

Chapter 3: Common Text Mining Visualizations 1

3.1 A tale of two (or three) cultures 1

3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

3.2.1 Term Frequency 1

3.2.2 Word Associations 1

3.2.3 Word Networks 1

3.3 Simple Word Clusters: Hierarchical Dendrograms 1

3.4 Word Clouds: Overused but Effective 1

3.4.1 One Corpus Word Clouds 1

3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

3.4.3 Polarized Tag Plot 1

3.5 Summary 1

Chapter 4: Sentiment Scoring 1

4.1 What is Sentiment Analysis? 1

4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

4.3 Polarity: Simple Sentiment Scoring 1

4.3.1 Subjectivity Lexicons 1

4.3.2 Qdap’s Scoring for positive and negative word choice 1

4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

4.4 Emoticons :) Dealing with these perplexing clues 1

4.4.1 Symbol-Based Emoticons Native to R 1

4.4.2 Punctuation Based Emoticons 1

4.4.3 Emoji 1

4.5 R’s Archived Sentiment Scoring Library 1

4.5 Sentiment the tidytext way 1

4.6 Airbnb.com Boston Wrap Up 1

4.7 Summary 1

Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

5.1 What is clustering? 1

5.1.1 K Means Clustering 1

5.1.2 Spherical K Means Clustering 1

5.1.3 K Mediod Clustering 1

5.1.4 Evaluating the cluster approaches 1

5.2 Calculating & Exploring String Distance 1

5.2.1 What is string distance? 1

5.2.2 Fuzzy Matching-amatch, ain 1

5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

5.3 LDA Topic Modeling Explained 1

5.3.2 Topic Modeling Case Study 1

5.3.2 LDA &LDAvis 1

5.4 Text to Vectors using “text2vec” 1

5.4.1 text2vec 1

5.5 Summary 1

Chapter 6: Document Classification: Finding Clickbait from Headlines 1

6.1 What is document classification? 1

6.2 Clickbait Case Study 1

6.2.2 Session & Data Set Up 1

6.2.3 GLMNET Training 1

6.2.4 GLMNET Test Predictions 1

6.2.5 Test Set Evaluation 1

6.2.6 Finding the most impactful words 1

6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

6.3 Summary 1

Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

7.1 Classification Vs Prediction 1

7.2 Case Study I: Will this patient come back to the hospital? 1

7.2.2 Patient Readmission in the Text Mining Workflow 1

7.2.3 Session & Data Set Up 1

7.2.4 Patient Modeling 1

7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

7.2.5.1 Additional Evaluation Metrics 1

7.2.6 Apply the model to new patients 1

7.2.7 Patient Readmission Conclusion 1

7.3 Case Study II: Predicting Box Office Success 1

7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

7.3.3 Session & Data Set Up 1

7.3.4 Opening Weekend Modeling 1

7.3.5 Model Evaluation 1

7.3.6 Apply the Model to new Movie Reviews 1

7.3.7 Movie Revenue Conclusion 1

7.4 Summary 1

Chapter 8: The OpenNLP Project 1

8.1 What is the OpenNLP project? 1

8.2 R’s OpenNLP Package 1

8.3 Named Entities in Hillary Clinton’s Email 1

8.3.1 R Session Set-up 1

8.3.2 Minor Text Cleaning 1

8.3.3 Using OpenNLP on a single email 1

8.3.4 Using OpenNLP on multiple documents 1

8.3.5 Revisiting the Text Mining Workflow 1

8.4 Analyzing the Named Entities 1

8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

8.4.2 Mapping Only European Locations 1

8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

8.4.4 Stock Charts for Entities 1

8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

8.5 Summary 1

Chapter 9: Text Sources 1

9.1 Sourcing Text 1

9.2 Web Sources 1

9.2.1 Web Scraping a Single Page with rvest 1

9.2.2 Web Scraping Multiple Pages with rvest 1

9.2.3 Application Program Interfaces (APIs) 1

9.2.4 Newspaper Articles from The Guardian Newspaper 1

9.2.5 Tweets using the “twitteR” Package 1

9.2.6 Calling an API without a dedicated R package 1

9.2.7 Using jsonlite to access the New York Times 1

9.2.8 Using RCurl & XML to Parse Google News Feeds 1

9.2.9 The tm library Web-Mining Plugin 1

9.3 Getting Text from File Sources 1

9.3.1 Individual CSV, TXT and Microsoft Office Files 1

9.3.2 Reading multiple files quickly 1

9.3.2 Extracting Text from PDFs 1

9.3.3 Optical Character Recognition: Extracting Text from Images 1

9.4 Summary 1

Text Mining in Practice with R

Product form

£49.46

Includes FREE delivery

RRP £54.95 – you save £5.49 (9%)

Order before 4pm tomorrow for delivery by Sat 27 Dec 2025.

A Hardback by Ted Kwartler

15 in stock


    View other formats and editions of Text Mining in Practice with R by Ted Kwartler

    Publisher: John Wiley & Sons Inc
    Publication Date: 21/07/2017
    ISBN13: 9781119282013, 978-1119282013
    ISBN10: 1119282012
    Also in:
    Data mining

    Description

    Book Synopsis
    A reliable, cost-effective approach to extracting priceless business information from all sources of text Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information.

    Table of Contents

    Foreword 1

    Chapter 1: What is Text Mining? 1

    1.1 What is it? 1

    1.1.1 What is text mining in practice? 1

    1.1.2 Where does text mining fit? 1

    1.2 Why we care about text mining? 1

    1.2.1 What are the consequences of ignoring text? 1

    1.2.2 What are the benefits of text mining? 1

    1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

    1.3 A basic workflow. How the process works. 1

    1.4 What tools do I need to get started with this? 1

    1.5 A Simple Example 1

    1.6 A Real World Use Case 1

    1.7 Summary 1

    Chapter 2: Basics of text mining 1

    2.1 What is Text Mining in a practical sense? 1

    2.2 Types of Text Mining: Bag of Words. 1

    2.2.1 Types of Text Mining: Syntactic Parsing. 1

    2.3 The text mining process in context 1

    2.4 String Manipulation: Number of Characters & Substitutions 1

    2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

    2.5 Keyword Scanning 1

    2.6 String Packages stringr & stringi 1

    2.7 Preprocessing Steps for Bag of Words Text Mining 1

    2.8 Spell Check 1

    2.9 Frequent Terms & Associations 1

    2.9 Delta Assist Wrap Up 1

    2.10 Summary 1

    Chapter 3: Common Text Mining Visualizations 1

    3.1 A tale of two (or three) cultures 1

    3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

    3.2.1 Term Frequency 1

    3.2.2 Word Associations 1

    3.2.3 Word Networks 1

    3.3 Simple Word Clusters: Hierarchical Dendrograms 1

    3.4 Word Clouds: Overused but Effective 1

    3.4.1 One Corpus Word Clouds 1

    3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

    3.4.3 Polarized Tag Plot 1

    3.5 Summary 1

    Chapter 4: Sentiment Scoring 1

    4.1 What is Sentiment Analysis? 1

    4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

    4.3 Polarity: Simple Sentiment Scoring 1

    4.3.1 Subjectivity Lexicons 1

    4.3.2 Qdap’s Scoring for positive and negative word choice 1

    4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

    4.4 Emoticons :) Dealing with these perplexing clues 1

    4.4.1 Symbol-Based Emoticons Native to R 1

    4.4.2 Punctuation Based Emoticons 1

    4.4.3 Emoji 1

    4.5 R’s Archived Sentiment Scoring Library 1

    4.5 Sentiment the tidytext way 1

    4.6 Airbnb.com Boston Wrap Up 1

    4.7 Summary 1

    Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

    5.1 What is clustering? 1

    5.1.1 K Means Clustering 1

    5.1.2 Spherical K Means Clustering 1

    5.1.3 K Mediod Clustering 1

    5.1.4 Evaluating the cluster approaches 1

    5.2 Calculating & Exploring String Distance 1

    5.2.1 What is string distance? 1

    5.2.2 Fuzzy Matching-amatch, ain 1

    5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

    5.3 LDA Topic Modeling Explained 1

    5.3.2 Topic Modeling Case Study 1

    5.3.2 LDA &LDAvis 1

    5.4 Text to Vectors using “text2vec” 1

    5.4.1 text2vec 1

    5.5 Summary 1

    Chapter 6: Document Classification: Finding Clickbait from Headlines 1

    6.1 What is document classification? 1

    6.2 Clickbait Case Study 1

    6.2.2 Session & Data Set Up 1

    6.2.3 GLMNET Training 1

    6.2.4 GLMNET Test Predictions 1

    6.2.5 Test Set Evaluation 1

    6.2.6 Finding the most impactful words 1

    6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

    6.3 Summary 1

    Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

    7.1 Classification Vs Prediction 1

    7.2 Case Study I: Will this patient come back to the hospital? 1

    7.2.2 Patient Readmission in the Text Mining Workflow 1

    7.2.3 Session & Data Set Up 1

    7.2.4 Patient Modeling 1

    7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

    7.2.5.1 Additional Evaluation Metrics 1

    7.2.6 Apply the model to new patients 1

    7.2.7 Patient Readmission Conclusion 1

    7.3 Case Study II: Predicting Box Office Success 1

    7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

    7.3.3 Session & Data Set Up 1

    7.3.4 Opening Weekend Modeling 1

    7.3.5 Model Evaluation 1

    7.3.6 Apply the Model to new Movie Reviews 1

    7.3.7 Movie Revenue Conclusion 1

    7.4 Summary 1

    Chapter 8: The OpenNLP Project 1

    8.1 What is the OpenNLP project? 1

    8.2 R’s OpenNLP Package 1

    8.3 Named Entities in Hillary Clinton’s Email 1

    8.3.1 R Session Set-up 1

    8.3.2 Minor Text Cleaning 1

    8.3.3 Using OpenNLP on a single email 1

    8.3.4 Using OpenNLP on multiple documents 1

    8.3.5 Revisiting the Text Mining Workflow 1

    8.4 Analyzing the Named Entities 1

    8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

    8.4.2 Mapping Only European Locations 1

    8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

    8.4.4 Stock Charts for Entities 1

    8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

    8.5 Summary 1

    Chapter 9: Text Sources 1

    9.1 Sourcing Text 1

    9.2 Web Sources 1

    9.2.1 Web Scraping a Single Page with rvest 1

    9.2.2 Web Scraping Multiple Pages with rvest 1

    9.2.3 Application Program Interfaces (APIs) 1

    9.2.4 Newspaper Articles from The Guardian Newspaper 1

    9.2.5 Tweets using the “twitteR” Package 1

    9.2.6 Calling an API without a dedicated R package 1

    9.2.7 Using jsonlite to access the New York Times 1

    9.2.8 Using RCurl & XML to Parse Google News Feeds 1

    9.2.9 The tm library Web-Mining Plugin 1

    9.3 Getting Text from File Sources 1

    9.3.1 Individual CSV, TXT and Microsoft Office Files 1

    9.3.2 Reading multiple files quickly 1

    9.3.2 Extracting Text from PDFs 1

    9.3.3 Optical Character Recognition: Extracting Text from Images 1

    9.4 Summary 1

    Recently viewed products

    © 2025 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account