Description

Book Synopsis

This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:
- information extraction and retrieval;
- text classification and clustering;
- opinion mining;
- comprehension aids (automatic summarization, machine translation, visualization).
In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections.
Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.

Contents

Part 1: Information Retrieval
1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier.
2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari,
 Tuong Vinh Truong and Nicolas Usunier.
Part 2: Classification and Clustering
3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,
 Michel Burlet and Yves Denneulin.
4. Kernel Methods for Textual Information Access, Jean-Michel Renders.
5. Topic-Based Generative Models for Text 
Information Access, Jean-Cédric Chappelier.
6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi.
Part 3: Multilingualism
7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon.
Part 4: Emerging Applications
8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh.
9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and
 Fréderic Béchet.



Table of Contents

Introduction xiii
Eric Gaussier and François Yvon

PART 1: INFORMATION RETRIEVAL 1

Chapter 1. Probabilistic Models for Information Retrieval 3
Stéphane Clinchant and Eric Gaussier

1.1. Introduction 3

1.3. Probability ranking principle (PRP) 10

1.4. Language models 15

1.5. Informational approaches 21

1.6. Experimental comparison 27

1.7. Tools for information retrieval 28

1.8. Conclusion 28

1.9. Bibliography 29

Chapter 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval 33
Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong, and Nicolas Usunier

2.1. Introduction 33

2.2. Application to automatic text summarization 45

2.3. Application to information retrieval 49

2.4. Conclusion 54

2.5. Bibliography 54

PART 2: CLASSIFICATION AND CLUSTERING 59

Chapter 3. Logistic Regression and Text Classification 61
Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,Michel Burlet, and Yves Denneulin

3.1. Introduction 61

3.2. Generalized linear model62

3.3. Parameter estimation 65

3.4. Logistic regression 68

3.5. Model selection 70

3.6. Logistic regression applied to text classification 74

3.7. Conclusion 81

3.8. Bibliography 82

Chapter 4. Kernel Methods for Textual Information Access 85
Jean-Michel Renders

4.1. Kernel methods: context and intuitions 85

4.2. General principles of kernel methods 88

4.3. General problems with kernel choices (kernel engineering) 95

4.4. Kernel versions of standard algorithms: examples of solvers 97

4.5. Kernels for text entities 103

4.6. Summary 123

4.7. Bibliography 124

Chapter 5. Topic-Based Generative Models for Text Information Access 129
Jean-Cédric Chappelier

5.1. Introduction 129

5.2. Topic-based models 135

5.3. Topic models 142

5.4. Term models 161

5.5. Similarity measures between documents 164

5.6. Conclusion 168

5.7. Appendix: topic model software 169

5.8. Bibliography 170

Chapter 6. Conditional Random Fields for Information Extraction 179
Isabelle Tellier and Marc Tommasi

6.1. Introduction 179

6.2. Information extraction 180

6.3. Machine learning for information extraction 184

6.4. Introduction to conditional random fields 187

6.5. Conditional random fields 193

6.6. Conditional random fields and their applications 203

6.7. Conclusion 214

6.8. Bibliography 215

PART 3: MULTILINGUALISM 221

Chapter 7. Statistical Methods for Machine Translation 223
Alexandre Allauzen and François Yvon

7.1. Introduction 223

7.2. Probabilistic machine translation: an overview 227

7.3. Phrase-based models 235

7.4. Modeling reorderings 250

7.5. Translation: a search problem 259

7.6. Evaluating machine translation 272

7.7. State-of-the-art and recent developments 279

7.8. Useful resources 287

7.9. Conclusion 289

7.10. Acknowledgments 291

7.11. Bibliography 291

PART 4: EMERGING APPLICATIONS 305

Chapter 8. Information Mining: Methods and Interfaces for Accessing Complex Information 307
Josiane Mothe, Kurt Englmeier, and Fionn Murtagh

8.1. Introduction 307

8.2. The multidimensional visualization of information 309

8.3. Domain mapping via social networks 320

8.4. Analyzing the variability of searches and data merging 323

8.5. The seven types of evaluation measures used in IR 327

8.6. Conclusion 331

8.7. Acknowledgments 332

8.8. Bibliography 332

Chapter 9. Opinion Detection as a Topic Classification Problem 337
Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot, and Fréderic Béchet

9.1. Introduction 337

9.2. The TREC and TAC evaluation campaigns 339

9.3. Cosine weights - a second glance 347

9.4. Which components for a opinion vectors? 348

9.5. Experiments 352

9.6. Extracting opinions from speech: automatic analysis of phone polls 357

9.7. Conclusion 365

9.8. Bibliography 366

Appendix A. Probabilistic Models: An Introduction 369
François Yvon

A.1. Introduction 369

A.2. Supervised categorization 370

A.3. Unsupervised learning: the multinomial mixture model 384

A.4. Markov models: statistical models for sequences 391

A.5. Hidden Markov models 397

A.6. Conclusion 410

A.7. A primer of probability theory 411

A.8. Bibliography 420

List of Authors 423

Index 425

Textual Information Access: Statistical Models

Product form

£158.60

Includes FREE delivery

RRP £166.95 – you save £8.35 (5%)

Order before 4pm today for delivery by Tue 6 Jan 2026.

A Hardback by Eric Gaussier, Francois Yvon

10 in stock


    View other formats and editions of Textual Information Access: Statistical Models by Eric Gaussier

    Publisher: ISTE Ltd and John Wiley & Sons Inc
    Publication Date: 13/04/2012
    ISBN13: 9781848213227, 978-1848213227
    ISBN10: 1848213220

    Description

    Book Synopsis

    This book presents statistical models that have recently been developed within several research communities to access information contained in text collections. The problems considered are linked to applications aiming at facilitating information access:
    - information extraction and retrieval;
    - text classification and clustering;
    - opinion mining;
    - comprehension aids (automatic summarization, machine translation, visualization).
    In order to give the reader as complete a description as possible, the focus is placed on the probability models used in the applications concerned, by highlighting the relationship between models and applications and by illustrating the behavior of each model on real collections.
    Textual Information Access is organized around four themes: informational retrieval and ranking models, classification and clustering (regression logistics, kernel methods, Markov fields, etc.), multilingualism and machine translation, and emerging applications such as information exploration.

    Contents

    Part 1: Information Retrieval
    1. Probabilistic Models for Information Retrieval, Stéphane Clinchant and Eric Gaussier.
    2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval, Massih-Réza Amini, David Buffoni, Patrick Gallinari,
 Tuong Vinh Truong and Nicolas Usunier.
    Part 2: Classification and Clustering
    3. Logistic Regression and Text Classification, Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,
 Michel Burlet and Yves Denneulin.
    4. Kernel Methods for Textual Information Access, Jean-Michel Renders.
    5. Topic-Based Generative Models for Text 
Information Access, Jean-Cédric Chappelier.
    6. Conditional Random Fields for Information Extraction, Isabelle Tellier and Marc Tommasi.
    Part 3: Multilingualism
    7. Statistical Methods for Machine Translation, Alexandre Allauzen and François Yvon.
    Part 4: Emerging Applications
    8. Information Mining: Methods and Interfaces for Accessing Complex Information, Josiane Mothe, Kurt Englmeier and Fionn Murtagh.
    9. Opinion Detection as a Topic Classification Problem, Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot and
 Fréderic Béchet.



    Table of Contents

    Introduction xiii
    Eric Gaussier and François Yvon

    PART 1: INFORMATION RETRIEVAL 1

    Chapter 1. Probabilistic Models for Information Retrieval 3
    Stéphane Clinchant and Eric Gaussier

    1.1. Introduction 3

    1.3. Probability ranking principle (PRP) 10

    1.4. Language models 15

    1.5. Informational approaches 21

    1.6. Experimental comparison 27

    1.7. Tools for information retrieval 28

    1.8. Conclusion 28

    1.9. Bibliography 29

    Chapter 2. Learnable Ranking Models for Automatic Text Summarization and Information Retrieval 33
    Massih-Réza Amini, David Buffoni, Patrick Gallinari, Tuong Vinh Truong, and Nicolas Usunier

    2.1. Introduction 33

    2.2. Application to automatic text summarization 45

    2.3. Application to information retrieval 49

    2.4. Conclusion 54

    2.5. Bibliography 54

    PART 2: CLASSIFICATION AND CLUSTERING 59

    Chapter 3. Logistic Regression and Text Classification 61
    Sujeevan Aseervatham, Eric Gaussier, Anestis Antoniadis,Michel Burlet, and Yves Denneulin

    3.1. Introduction 61

    3.2. Generalized linear model62

    3.3. Parameter estimation 65

    3.4. Logistic regression 68

    3.5. Model selection 70

    3.6. Logistic regression applied to text classification 74

    3.7. Conclusion 81

    3.8. Bibliography 82

    Chapter 4. Kernel Methods for Textual Information Access 85
    Jean-Michel Renders

    4.1. Kernel methods: context and intuitions 85

    4.2. General principles of kernel methods 88

    4.3. General problems with kernel choices (kernel engineering) 95

    4.4. Kernel versions of standard algorithms: examples of solvers 97

    4.5. Kernels for text entities 103

    4.6. Summary 123

    4.7. Bibliography 124

    Chapter 5. Topic-Based Generative Models for Text Information Access 129
    Jean-Cédric Chappelier

    5.1. Introduction 129

    5.2. Topic-based models 135

    5.3. Topic models 142

    5.4. Term models 161

    5.5. Similarity measures between documents 164

    5.6. Conclusion 168

    5.7. Appendix: topic model software 169

    5.8. Bibliography 170

    Chapter 6. Conditional Random Fields for Information Extraction 179
    Isabelle Tellier and Marc Tommasi

    6.1. Introduction 179

    6.2. Information extraction 180

    6.3. Machine learning for information extraction 184

    6.4. Introduction to conditional random fields 187

    6.5. Conditional random fields 193

    6.6. Conditional random fields and their applications 203

    6.7. Conclusion 214

    6.8. Bibliography 215

    PART 3: MULTILINGUALISM 221

    Chapter 7. Statistical Methods for Machine Translation 223
    Alexandre Allauzen and François Yvon

    7.1. Introduction 223

    7.2. Probabilistic machine translation: an overview 227

    7.3. Phrase-based models 235

    7.4. Modeling reorderings 250

    7.5. Translation: a search problem 259

    7.6. Evaluating machine translation 272

    7.7. State-of-the-art and recent developments 279

    7.8. Useful resources 287

    7.9. Conclusion 289

    7.10. Acknowledgments 291

    7.11. Bibliography 291

    PART 4: EMERGING APPLICATIONS 305

    Chapter 8. Information Mining: Methods and Interfaces for Accessing Complex Information 307
    Josiane Mothe, Kurt Englmeier, and Fionn Murtagh

    8.1. Introduction 307

    8.2. The multidimensional visualization of information 309

    8.3. Domain mapping via social networks 320

    8.4. Analyzing the variability of searches and data merging 323

    8.5. The seven types of evaluation measures used in IR 327

    8.6. Conclusion 331

    8.7. Acknowledgments 332

    8.8. Bibliography 332

    Chapter 9. Opinion Detection as a Topic Classification Problem 337
    Juan-Manuel Torres-Moreno, Marc El-Bèze, Patrice Bellot, and Fréderic Béchet

    9.1. Introduction 337

    9.2. The TREC and TAC evaluation campaigns 339

    9.3. Cosine weights - a second glance 347

    9.4. Which components for a opinion vectors? 348

    9.5. Experiments 352

    9.6. Extracting opinions from speech: automatic analysis of phone polls 357

    9.7. Conclusion 365

    9.8. Bibliography 366

    Appendix A. Probabilistic Models: An Introduction 369
    François Yvon

    A.1. Introduction 369

    A.2. Supervised categorization 370

    A.3. Unsupervised learning: the multinomial mixture model 384

    A.4. Markov models: statistical models for sequences 391

    A.5. Hidden Markov models 397

    A.6. Conclusion 410

    A.7. A primer of probability theory 411

    A.8. Bibliography 420

    List of Authors 423

    Index 425

    Recently viewed products

    © 2025 Book Curl

      • American Express
      • Apple Pay
      • Diners Club
      • Discover
      • Google Pay
      • Maestro
      • Mastercard
      • PayPal
      • Shop Pay
      • Union Pay
      • Visa

      Login

      Forgot your password?

      Don't have an account yet?
      Create account