Description

Book Synopsis


Table of Contents

Preface xv

Acknowledgments xvii

Part 1 Introduction and Theory 1

1 Alternative Data: The Lay of the Land 3

1.1 Introduction 3

1.2 What is “Alternative Data”? 5

1.3 Segmentation of Alternative Data 7

1.4 The Many Vs of Big Data 9

1.5 Why Alternative Data? 11

1.6 Who is Using Alternative Data? 15

1.7 Capacity of a Strategy and Alternative Data 16

1.8 Alternative Data Dimensions 19

1.9 Who Are the Alternative Data Vendors? 23

1.10 Usage of Alternative Datasets on the Buy Side 24

1.11 Conclusion 26

2 The Value of Alternative Data 27

2.1 Introduction 27

2.2 The Decay of Investment Value 27

2.3 Data Markets 29

2.4 The Monetary Value of Data (Part I) 31

2.4.1 Cost Value 34

2.4.2 Market Value 34

2.4.3 Economic Value 35

2.5 Evaluating (Alternative) Data Strategies with and without Backtesting 35

2.5.1 Systematic Investors 36

2.5.2 Discretionary Investors 38

2.5.3 Risk Managers 39

2.6 The Monetary Value of Data (Part II) 39

2.6.1 The Buyer’s Perspective 40

2.6.2 The Seller’s Perspective 41

2.7 The Advantages of Maturing Alternative Datasets 45

2.8 Summary 46

3 Alternative Data Risks and Challenges 47

3.1 Legal Aspects of Data 47

3.2 Risks of Using Alternative Data 50

3.3 Challenges of Using Alternative Data 51

3.3.1 Entity Matching 52

3.3.2 Missing Data 54

3.3.3 Structuring the Data 55

3.3.4 Treatment of Outliers 56

3.4 Aggregating the Data 57

3.5 Summary 58

4 Machine Learning Techniques 59

4.1 Introduction 59

4.2 Machine Learning: Definitions and Techniques 60

4.2.1 Bias, Variance, and Noise 60

4.2.2 Cross-Validation 61

4.2.3 Introducing Machine Learning 62

4.2.4 Popular Supervised Machine Learning Techniques 64

4.2.5 Clustering-Based Unsupervised Machine Learning Techniques 70

4.2.6 Other Unsupervised Machine Learning Techniques 71

4.2.7 Machine Learning Libraries 71

4.2.8 Neutral Networks and Deep Learning 72

4.2.9 Gaussian Processes 80

4.3 Which Technique to Choose? 82

4.4 Assumptions and Limitations of the Machine Learning Techniques 84

4.4.1 Causality 84

4.4.2 Non-stationarity 85

4.4.3 Restricted Information Set 86

4.4.4 The Algorithm Choice 86

4.5 Structuring Images 87

4.5.1 Features and Feature Detection Algorithms 87

4.5.2 Deep Learning and CNNs for Image Classification 89

4.5.3 Augmenting Satellite Image Data with Other Datasets 90

4.5.4 Imaging Tools 91

4.6 Natural Language Processing (NLP) 91

4.6.1 What is Natural Language Processing (NLP)? 91

4.6.2 Normalization 93

4.6.3 Creating Word Embeddings: Bag-of-Words 94

4.6.4 Creating Word Embeddings: Word2vec and Beyond 94

4.6.5 Sentiment Analysis and NLP Tasks as Classification Problems 96

4.6.6 Topic Modeling 96

4.6.7 Various Challenges in NLP 97

4.6.8 Different Languages and Different Texts 98

4.6.9 Speech in NLP 99

4.6.10 NLP Tools 100

4.7 Summary 102

5 The Processes behind the Use of Alternative Data 105

5.1 Introduction 105

5.2 Steps in the Alternative Data Journey 106

5.2.1 Step 1. Set up a Vision and Strategy 106

5.2.2 Step 2. Identify the Appropriate Datasets 107

5.2.3 Step 3. Perform Due Diligence on Vendors 108

5.2.4 Step 4. Pre-assess Risks 109

5.2.5 Step 5. Pre-assess the Existence of Signals 109

5.2.6 Step 6. Data Onboarding 110

5.2.7 Step 7. Data Preprocessing 110

5.2.8 Step 8. Signal Extraction 111

5.2.9 Step 9. Implementation (or Deployment in Production) 112

5.2.10 Maintenance Process 113

5.3 Structuring Teams to Use Alternative Data 114

5.4 Data Vendors 116

5.5 Summary 118

6 Factor Investing 119

6.1 Introduction 119

6.1.1 The CAPM 119

6.2 Factor Models 120

6.2.1 The Arbitrage Pricing Theory 122

6.2.2 The Fama-French 3-Factor Model 123

6.2.3 The Carhart Model 124

6.2.4 Other Approaches (Data Mining) 125

6.3 The Difference between Cross-Sectional and Time Series Trading Approaches 126

6.4 Why Factor Investing? 126

6.5 Smart Beta Indices Using Alternative Data Inputs 127

6.6 ESG Factors 128

6.7 Direct and Indirect Prediction 129

6.8 Summary 132

Part 2 Practical Applications 133

7 Missing Data: Background 135

7.1 Introduction 135

7.2 Missing Data Classification 136

7.2.1 Missing Data Treatments 137

7.3 Literature Overview of Missing Data Treatments 139

7.3.1 Luengo et al. (2012) 139

7.3.2 Garcia-Laencina et al. (2010) 143

7.3.3 Grzymala-Busse et al. (2000) 146

7.3.4 Zou et al. (2005) 147

7.3.5 Jerez et al. (2010) 147

7.3.6 Farhangfar et al. (2008) 148

7.3.7 Kang et al. (2013) 149

7.4 Summary 149

8 Missing Data: Case Studies 151

8.1 Introduction 151

8.2 Case Study: Imputing Missing Values in Multivariate Credit Default Swap Time Series 152

8.2.1 Missing Data Classification 153

8.2.2 Imputation Metrics 154

8.2.3 CDS Data and Test Data Generation 154

8.2.4 Multiple Imputation Methods 157

8.2.5 Deterministic and EOF-Based Techniques 160

8.2.6 Results 164

8.3 Case Study: Satellite Images 173

8.4 Summary 176

8.5 Appendix: General Description of the MICE Procedure 178

8.6 Appendix: Software Libraries Used in This Chapter 179

9 Outliers (Anomalies) 181

9.1 Introduction 181

9.2 Outliers Definition, Classification, and Approaches to Detection 182

9.3 Temporal Structure 183

9.4 Global Versus Local Outliers, Point Anomalies, and Micro-Clusters 184

9.5 Outlier Detection Problem Setup 184

9.6 Comparative Evaluation of Outlier Detection Algorithms 185

9.7 Approaches to Outlier Explanation 189

9.7.1 Micenkova et al. 189

9.7.2 Duan et al. 191

9.7.3 Angiulli et al. 192

9.8 Case Study: Outlier Detection on Fed Communications Index 194

9.9 Summary 201

9.10 Appendix 202

9.10.1 Model-Based Techniques 202

9.10.2 Distance-Based Techniques 202

9.10.3 Density-Based Techniques 203

9.10.4 Heuristics-Based Approaches 203

10 Automotive Fundamental Data 205

10.1 Introduction 205

10.2 Data 206

10.3 Approach 1: Indirect Approach 211

10.3.1 The Steps Followed 212

10.3.2 Stage 1 213

10.4 Approach 2: Direct Approach 223

10.4.1 The Data 223

10.4.2 Factor Generation 224

10.4.3 Factor Performance 225

10.4.4 Detailed Factor Results 229

10.5 Gaussian Processes Example 238

10.6 Summary 239

10.7 Appendix 240

10.7.1 List of Companies 240

10.7.2 Description of Financial Statement Items 241

10.7.3 Ratios Used 242

10.7.4 IHS Markit Data Features 243

10.7.5 Reporting Delays by Country 244

11 Surveys and Crowdsourced Data 245

11.1 Introduction 245

11.2 Survey Data as Alternative Data 245

11.3 The Data 247

11.4 The Product 247

11.5 Case Studies 249

11.5.1 Case Study: Company Event Study (Pooled Survey) 249

11.5.2 Case Study: Oil and Gas Production (Q&A Survey) 252

11.6 Some Technical Considerations on Surveys 254

11.7 Crowdsourcing Analyst Estimates Survey 255

11.8 Alpha Capture Data 256

11.9 Summary 256

11.10 Appendix 256

12 Purchasing Managers’ Index 259

12.1 Introduction 259

12.2 PMI Performance 261

12.3 Nowcasting GDP Growth 262

12.4 Impacts on Financial Markets 263

12.5 Summary 266

13 Satellite Imagery and Aerial Photography 267

13.1 Introduction 267

13.2 Forecasting US Export Growth 269

13.3 Car Counts and Earnings Per Share for Retailers 271

13.4 Measuring Chinese PMI Manufacturing with Satellite Data 277

13.5 Summary 280

14 Location Data 283

14.1 Introduction 283

14.2 Shipping Data to Track Crude Oil Supplies 283

14.3 Mobile Phone Location Data to Understand Retail Activity 287

14.3.1 Trading REIT ETF Using Mobile Phone Location Data 288

14.3.2 Estimating Earnings per Share with Mobile Phone Location Data 291

14.4 Taxi Ride Data and New York Fed Meetings 295

14.5 Corporate Jet Location Data and M&A 296

14.6 Summary 298

15 Text Web Social Media and News 299

15.1 Introduction 299

15.2 Collecting Web Data 299

15.3 Social Media 300

15.3.1 Hedonometer Index 302

15.3.2 Using Twitter Data to Help Forecast US Change in Nonfarm Payrolls 305

15.3.3 Twitter Data to Forecast Stock Market Reaction to FOMC 308

15.3.4 Liquidity and Sentiment from Social Media 309

15.4 News 309

15.4.1 Machine-Readable News to Trade FX and Understand FX Volatility 310

15.4.2 Federal Reserve Communications and US Treasury Yields 316

15.5 Other Web Sources 320

15.5.1 Measuring Consumer Price Inflation 321

15.6 Summary 322

16 Investor Attention 323

16.1 Introduction 323

16.2 Readership of Payrolls to Measure Investor Attention 323

16.3 Google Trends Data to Measure Market Themes 325

16.4 Investopedia Search Data to Measure Investor Anxiety 328

16.5 Using Wikipedia to Understand Price Action in Cryptocurrencies 330

16.6 Online Attention for Countries to Inform EMFX Trading 330

16.7 Summary 333

17 Consumer Transactions 335

17.1 Introduction 335

17.2 Credit and Debit Card Transaction Data 336

17.3 Consumer Receipts 337

17.4 Summary 340

18 Government, Industrial, and Corporate Data 341

18.1 Introduction 341

18.2 Using Innovation Measures to Trade Equities 342

18.3 Quantifying Currency Crisis Risk 344

18.4 Modeling Central Bank Intervention in Currency Markets 346

18.5 Summary 348

19 Market Data 351

19.1 Introduction 351

19.2 Relationship between Institutional FX Flow Data and FX Spot 351

19.3 Understanding Liquidity Using High-Frequency FX Data 355

19.4 Summary 357

20 Alternative Data in Private Markets 359

20.1 Introduction 359

20.2 Defining Private Equity and Venture Capital Firms 360

20.3 Private Equity Datasets 362

20.4 Understanding the Performance of Private Firms 363

20.5 Summary 364

Conclusions 365

Some Last Words 365

References 367

About the Authors 373

Index 375

The Book of Alternative Data

    Product form

    £30.39

    Includes FREE delivery

    RRP £37.99 – you save £7.60 (20%)

    Order before 4pm today for delivery by Mon 6 Jul 2026.

    A Hardback by Alexander Denev, Saeed Amen

    15 in stock

      Trusted by thousands of customers. See 2,385+ Customer Reviews

      View other formats and editions of The Book of Alternative Data by Alexander Denev

      Publisher: John Wiley & Sons Inc
      Publication Date: 27/08/2020
      ISBN13: 9781119601791, 978-1119601791
      ISBN10: 1119601797

      Description

      Book Synopsis


      Table of Contents

      Preface xv

      Acknowledgments xvii

      Part 1 Introduction and Theory 1

      1 Alternative Data: The Lay of the Land 3

      1.1 Introduction 3

      1.2 What is “Alternative Data”? 5

      1.3 Segmentation of Alternative Data 7

      1.4 The Many Vs of Big Data 9

      1.5 Why Alternative Data? 11

      1.6 Who is Using Alternative Data? 15

      1.7 Capacity of a Strategy and Alternative Data 16

      1.8 Alternative Data Dimensions 19

      1.9 Who Are the Alternative Data Vendors? 23

      1.10 Usage of Alternative Datasets on the Buy Side 24

      1.11 Conclusion 26

      2 The Value of Alternative Data 27

      2.1 Introduction 27

      2.2 The Decay of Investment Value 27

      2.3 Data Markets 29

      2.4 The Monetary Value of Data (Part I) 31

      2.4.1 Cost Value 34

      2.4.2 Market Value 34

      2.4.3 Economic Value 35

      2.5 Evaluating (Alternative) Data Strategies with and without Backtesting 35

      2.5.1 Systematic Investors 36

      2.5.2 Discretionary Investors 38

      2.5.3 Risk Managers 39

      2.6 The Monetary Value of Data (Part II) 39

      2.6.1 The Buyer’s Perspective 40

      2.6.2 The Seller’s Perspective 41

      2.7 The Advantages of Maturing Alternative Datasets 45

      2.8 Summary 46

      3 Alternative Data Risks and Challenges 47

      3.1 Legal Aspects of Data 47

      3.2 Risks of Using Alternative Data 50

      3.3 Challenges of Using Alternative Data 51

      3.3.1 Entity Matching 52

      3.3.2 Missing Data 54

      3.3.3 Structuring the Data 55

      3.3.4 Treatment of Outliers 56

      3.4 Aggregating the Data 57

      3.5 Summary 58

      4 Machine Learning Techniques 59

      4.1 Introduction 59

      4.2 Machine Learning: Definitions and Techniques 60

      4.2.1 Bias, Variance, and Noise 60

      4.2.2 Cross-Validation 61

      4.2.3 Introducing Machine Learning 62

      4.2.4 Popular Supervised Machine Learning Techniques 64

      4.2.5 Clustering-Based Unsupervised Machine Learning Techniques 70

      4.2.6 Other Unsupervised Machine Learning Techniques 71

      4.2.7 Machine Learning Libraries 71

      4.2.8 Neutral Networks and Deep Learning 72

      4.2.9 Gaussian Processes 80

      4.3 Which Technique to Choose? 82

      4.4 Assumptions and Limitations of the Machine Learning Techniques 84

      4.4.1 Causality 84

      4.4.2 Non-stationarity 85

      4.4.3 Restricted Information Set 86

      4.4.4 The Algorithm Choice 86

      4.5 Structuring Images 87

      4.5.1 Features and Feature Detection Algorithms 87

      4.5.2 Deep Learning and CNNs for Image Classification 89

      4.5.3 Augmenting Satellite Image Data with Other Datasets 90

      4.5.4 Imaging Tools 91

      4.6 Natural Language Processing (NLP) 91

      4.6.1 What is Natural Language Processing (NLP)? 91

      4.6.2 Normalization 93

      4.6.3 Creating Word Embeddings: Bag-of-Words 94

      4.6.4 Creating Word Embeddings: Word2vec and Beyond 94

      4.6.5 Sentiment Analysis and NLP Tasks as Classification Problems 96

      4.6.6 Topic Modeling 96

      4.6.7 Various Challenges in NLP 97

      4.6.8 Different Languages and Different Texts 98

      4.6.9 Speech in NLP 99

      4.6.10 NLP Tools 100

      4.7 Summary 102

      5 The Processes behind the Use of Alternative Data 105

      5.1 Introduction 105

      5.2 Steps in the Alternative Data Journey 106

      5.2.1 Step 1. Set up a Vision and Strategy 106

      5.2.2 Step 2. Identify the Appropriate Datasets 107

      5.2.3 Step 3. Perform Due Diligence on Vendors 108

      5.2.4 Step 4. Pre-assess Risks 109

      5.2.5 Step 5. Pre-assess the Existence of Signals 109

      5.2.6 Step 6. Data Onboarding 110

      5.2.7 Step 7. Data Preprocessing 110

      5.2.8 Step 8. Signal Extraction 111

      5.2.9 Step 9. Implementation (or Deployment in Production) 112

      5.2.10 Maintenance Process 113

      5.3 Structuring Teams to Use Alternative Data 114

      5.4 Data Vendors 116

      5.5 Summary 118

      6 Factor Investing 119

      6.1 Introduction 119

      6.1.1 The CAPM 119

      6.2 Factor Models 120

      6.2.1 The Arbitrage Pricing Theory 122

      6.2.2 The Fama-French 3-Factor Model 123

      6.2.3 The Carhart Model 124

      6.2.4 Other Approaches (Data Mining) 125

      6.3 The Difference between Cross-Sectional and Time Series Trading Approaches 126

      6.4 Why Factor Investing? 126

      6.5 Smart Beta Indices Using Alternative Data Inputs 127

      6.6 ESG Factors 128

      6.7 Direct and Indirect Prediction 129

      6.8 Summary 132

      Part 2 Practical Applications 133

      7 Missing Data: Background 135

      7.1 Introduction 135

      7.2 Missing Data Classification 136

      7.2.1 Missing Data Treatments 137

      7.3 Literature Overview of Missing Data Treatments 139

      7.3.1 Luengo et al. (2012) 139

      7.3.2 Garcia-Laencina et al. (2010) 143

      7.3.3 Grzymala-Busse et al. (2000) 146

      7.3.4 Zou et al. (2005) 147

      7.3.5 Jerez et al. (2010) 147

      7.3.6 Farhangfar et al. (2008) 148

      7.3.7 Kang et al. (2013) 149

      7.4 Summary 149

      8 Missing Data: Case Studies 151

      8.1 Introduction 151

      8.2 Case Study: Imputing Missing Values in Multivariate Credit Default Swap Time Series 152

      8.2.1 Missing Data Classification 153

      8.2.2 Imputation Metrics 154

      8.2.3 CDS Data and Test Data Generation 154

      8.2.4 Multiple Imputation Methods 157

      8.2.5 Deterministic and EOF-Based Techniques 160

      8.2.6 Results 164

      8.3 Case Study: Satellite Images 173

      8.4 Summary 176

      8.5 Appendix: General Description of the MICE Procedure 178

      8.6 Appendix: Software Libraries Used in This Chapter 179

      9 Outliers (Anomalies) 181

      9.1 Introduction 181

      9.2 Outliers Definition, Classification, and Approaches to Detection 182

      9.3 Temporal Structure 183

      9.4 Global Versus Local Outliers, Point Anomalies, and Micro-Clusters 184

      9.5 Outlier Detection Problem Setup 184

      9.6 Comparative Evaluation of Outlier Detection Algorithms 185

      9.7 Approaches to Outlier Explanation 189

      9.7.1 Micenkova et al. 189

      9.7.2 Duan et al. 191

      9.7.3 Angiulli et al. 192

      9.8 Case Study: Outlier Detection on Fed Communications Index 194

      9.9 Summary 201

      9.10 Appendix 202

      9.10.1 Model-Based Techniques 202

      9.10.2 Distance-Based Techniques 202

      9.10.3 Density-Based Techniques 203

      9.10.4 Heuristics-Based Approaches 203

      10 Automotive Fundamental Data 205

      10.1 Introduction 205

      10.2 Data 206

      10.3 Approach 1: Indirect Approach 211

      10.3.1 The Steps Followed 212

      10.3.2 Stage 1 213

      10.4 Approach 2: Direct Approach 223

      10.4.1 The Data 223

      10.4.2 Factor Generation 224

      10.4.3 Factor Performance 225

      10.4.4 Detailed Factor Results 229

      10.5 Gaussian Processes Example 238

      10.6 Summary 239

      10.7 Appendix 240

      10.7.1 List of Companies 240

      10.7.2 Description of Financial Statement Items 241

      10.7.3 Ratios Used 242

      10.7.4 IHS Markit Data Features 243

      10.7.5 Reporting Delays by Country 244

      11 Surveys and Crowdsourced Data 245

      11.1 Introduction 245

      11.2 Survey Data as Alternative Data 245

      11.3 The Data 247

      11.4 The Product 247

      11.5 Case Studies 249

      11.5.1 Case Study: Company Event Study (Pooled Survey) 249

      11.5.2 Case Study: Oil and Gas Production (Q&A Survey) 252

      11.6 Some Technical Considerations on Surveys 254

      11.7 Crowdsourcing Analyst Estimates Survey 255

      11.8 Alpha Capture Data 256

      11.9 Summary 256

      11.10 Appendix 256

      12 Purchasing Managers’ Index 259

      12.1 Introduction 259

      12.2 PMI Performance 261

      12.3 Nowcasting GDP Growth 262

      12.4 Impacts on Financial Markets 263

      12.5 Summary 266

      13 Satellite Imagery and Aerial Photography 267

      13.1 Introduction 267

      13.2 Forecasting US Export Growth 269

      13.3 Car Counts and Earnings Per Share for Retailers 271

      13.4 Measuring Chinese PMI Manufacturing with Satellite Data 277

      13.5 Summary 280

      14 Location Data 283

      14.1 Introduction 283

      14.2 Shipping Data to Track Crude Oil Supplies 283

      14.3 Mobile Phone Location Data to Understand Retail Activity 287

      14.3.1 Trading REIT ETF Using Mobile Phone Location Data 288

      14.3.2 Estimating Earnings per Share with Mobile Phone Location Data 291

      14.4 Taxi Ride Data and New York Fed Meetings 295

      14.5 Corporate Jet Location Data and M&A 296

      14.6 Summary 298

      15 Text Web Social Media and News 299

      15.1 Introduction 299

      15.2 Collecting Web Data 299

      15.3 Social Media 300

      15.3.1 Hedonometer Index 302

      15.3.2 Using Twitter Data to Help Forecast US Change in Nonfarm Payrolls 305

      15.3.3 Twitter Data to Forecast Stock Market Reaction to FOMC 308

      15.3.4 Liquidity and Sentiment from Social Media 309

      15.4 News 309

      15.4.1 Machine-Readable News to Trade FX and Understand FX Volatility 310

      15.4.2 Federal Reserve Communications and US Treasury Yields 316

      15.5 Other Web Sources 320

      15.5.1 Measuring Consumer Price Inflation 321

      15.6 Summary 322

      16 Investor Attention 323

      16.1 Introduction 323

      16.2 Readership of Payrolls to Measure Investor Attention 323

      16.3 Google Trends Data to Measure Market Themes 325

      16.4 Investopedia Search Data to Measure Investor Anxiety 328

      16.5 Using Wikipedia to Understand Price Action in Cryptocurrencies 330

      16.6 Online Attention for Countries to Inform EMFX Trading 330

      16.7 Summary 333

      17 Consumer Transactions 335

      17.1 Introduction 335

      17.2 Credit and Debit Card Transaction Data 336

      17.3 Consumer Receipts 337

      17.4 Summary 340

      18 Government, Industrial, and Corporate Data 341

      18.1 Introduction 341

      18.2 Using Innovation Measures to Trade Equities 342

      18.3 Quantifying Currency Crisis Risk 344

      18.4 Modeling Central Bank Intervention in Currency Markets 346

      18.5 Summary 348

      19 Market Data 351

      19.1 Introduction 351

      19.2 Relationship between Institutional FX Flow Data and FX Spot 351

      19.3 Understanding Liquidity Using High-Frequency FX Data 355

      19.4 Summary 357

      20 Alternative Data in Private Markets 359

      20.1 Introduction 359

      20.2 Defining Private Equity and Venture Capital Firms 360

      20.3 Private Equity Datasets 362

      20.4 Understanding the Performance of Private Firms 363

      20.5 Summary 364

      Conclusions 365

      Some Last Words 365

      References 367

      About the Authors 373

      Index 375

      Recently viewed products

      © 2026 Book Curl

        • American Express
        • Apple Pay
        • Diners Club
        • Discover
        • Google Pay
        • Maestro
        • Mastercard
        • PayPal
        • Shop Pay
        • Union Pay
        • Visa

        Login

        Forgot your password?

        Don't have an account yet?
        Create account