Description
Book SynopsisA series of essays introducing the applications of machine learning and statistics in natural language processing, speech recognition and web search for non-technical readers
Trade Review"This volume originates from a series of blog articles by the author, who works as senior staff research scientist for Google China. The blog articles have been rewritten to make them more accessible to uninitiated readers. As a result, the book contains 29 chapters which may be read independently. The aim is to provide evidence for the beauty of mathematics and the wealth of its applications to the layman . . . The volume may be quite valuable for readers who want to get some insight into how enterprises like Google achieve their performance, and how much mathematics is at work in the background of many commonplace services . . . "
~Dieter Riebesehl (Lüneburg), zbMath
Table of ContentsWords, languages vs. numbers, information. Natural language processing: from rules to statistics. Statistical language models. Chinese, Japanese, and Korean Word segmentation. Hidden Markov models. Measurement and usage of information. Fred Jelinek and modern natural language processing. Beauty of simplicity: Boolean algebra and search engines. Graph theory and web crawlers. PageRank–Google’s democratic ranking algorithm. Determing the relevance of webpages and queries. Finite state machines and dynamic programming: Core technologies of Google local search. Cosine similarity and news classification. Matrix calculation and clustering of text documents. Information fingerprints and their applications. Mathematical principles of cryptography. All that is gold does not glitter: search engine anti-SPAM. The importance of mathematical models. Don’t put all your eggs in one basket: maximum entropy modeling. The principle of (Chinese pinyin) input method editor. Bloom filter. Bayesian networks: Extensions of hidden Markov models. Conditional random field, syntactic parsing, and other applications. Viterbi and his algorithm. God algorithm: Expectation-maximization algorithms. Logistic regression and web search advertisement. Divide and conquer and Google cloud computing fundamentals. Google Brain and neural networks. The power of big data.