Request It Save to My Lists Export Return to Browse

Previous Record Next Record

Book Cover

Book

Author

Zhai, ChengXiang, author

Title Text data management and analysis : a practical introduction to information retrieval and text mining / ChengXiang Zhai, Sean Massung

Edition First edition

Published [New York, New York] : Association for Computing Machinery ; [San Rafael, California] : Morgan & Claypool Publishers, [2016]

©2016

Copies

Location Call no. Vol. Availability

WATERFT 006.312 Zha/Tdm AVAILABLE

Description xx, 510 pages : illustrations (some color) ; 24 cm

Series ACM books, 2374-6777 ; #12

ACM books ; #12. 2374-6777

Contents Part I. Overview and background -- 1. Introduction -- 1.1 Functions of text information systems -- 1.2 Conceptual framework for text information systems -- 1.3 Organization of the book -- 1.4 How to use this book -- Bibliographic notes and further reading -- 2. Background -- 2.1 Basics of probability and statistics -- 2.2 Information theory -- 2.3 Machine learning -- Bibliographic notes and further reading -- Exercises -- 3. Text data understanding -- 3.1 History and state of the art in NLP -- 3.2 NLP and text information systems -- 3.3 Text representation -- 3.4 Statistical language models -- Bibliographic notes and further reading -- Exercises -- 4. MeTA: a unified toolkit for text data management and analysis -- 4.1 Design philosophy -- 4.2 Setting up MeTA -- 4.3 Architecture -- 4.4 Tokenization with MeTA -- 4.5 Related toolkits -- Exercises --

Appendix A. Bayesian statistics -- Binomial estimation and the beta distribution -- Pseudo counts, smoothing, and setting hyperparameters -- Generalizing to a multinomial distribution -- The Dirichlet distribution -- Bayesian estimate of multinomial parameters -- Conclusion -- Appendix B. Expectation- maximization -- A simple mixture Unigram language model -- Maximum likelihood estimation -- Incomplete vs. complete data -- A lower bound of likelihood -- The general procedure of EM -- Appendix C. KL-divergence and Dirichlet prior smoothing -- Using KL-divergence for retrieval -- Using Dirichlet prior smoothing -- Computing the query model p(w [theta]q) -- References -- Index -- Authors' biographies

Part II. Text data access -- 5. Overview of text data access -- 5.1 Access mode: pull vs. push -- 5.2 Multimode interactive access -- 5.3 Text retrieval -- 5.4 Text retrieval vs. database retrieval -- 5.5 Document selection vs. document ranking -- Bibliographic notes and further reading -- Exercises -- 6. Retrieval models -- 6.1 Overview -- 6.2 Common form of a retrieval function -- 6.3 Vector space retrieval models -- 6.4 Probabilistic retrieval models -- Bibliographic notes and further reading -- Exercises -- 7. Feedback -- 7.1 Feedback in the vector space model -- 7.2 Feedback in language models -- Bibliographic notes and further reading -- Exercises -- 8. Sarch engine implementation -- 8.1 Tokenizer -- 8.2 Indexer -- 8.3 Scorer -- 8.4 Feedback implementation -- 8.5 Compression -- 8.6 Caching -- Bibliographic notes and further reading -- Exercises -- 9. Search engine evaluation -- 9.1 Introduction -- 9.2 Evaluation of set retrieval -- 9.3 Evaluation of a ranked list -- 9.4 Evaluation with multi-level judgements -- 9.5 Practical issues in evaluation -- Bibliographic notes and further reading -- Exercises -- 10. Web search -- 10.1 Web crawling -- 10.2 Web indexing -- 10.3 Link analysis -- 10.4 Learning to rank -- 10.5 The future of web search -- Bibliographic notes and further reading -- Exercises -- 11. Recommender systems -- 11.1 Content-based recommendation -- 11.2 Collaborative filtering -- 11.3 Evaluation of recommender systems -- Bibliographic notes and further reading -- Exercises --

Part III. Text data analysis -- 12. Overview of text data analysis -- 12.1 Motivation: applications of text data analysis -- 12.2 Text vs. non-text data: humans as subjective sensors -- 12.3 Landscape of text mining tasks -- 13. Word association mining -- 13.1 General idea of word association mining -- 13.2 Discovery of paradigmatic relations -- 13.3 Discovery of syntagmatic relations -- 13.4 Evaluation of word association mining -- Bibliographic notes and further reading -- Exercises -- 14. Text clustering -- 14.1 Overview of clustering techniques -- 14.2 Document clustering -- 14.3 Term clustering -- 14.4 Evaluation of text clustering -- Bibliographic notes and further reading -- Exercises -- 15. Text categorization -- 15.1 Introduction -- 15.2 Overview of text categorization methods -- 15.3 Text categorization problem -- 15.4 Features for text categorization -- 15.5 Classification algorithms -- 15.6 Evaluation of text categorization -- Bibliographic notes and further reading-- Exercises -- 16. Text summarization -- 16.1 Overview of text summarization techniques -- 16.2 Extractive text summarization -- 16.3 Abstractive text summarization -- 16.4 Evaluation of text summarization -- 16.5 Applications of text summarization -- Bibliographic notes and further reading -- Exercises -- 17. Topic analysis -- 17.1 Topics as terms -- 17.2 Topics as word distributions -- 17.3 Mining one topic from text -- 17.4 Probabilistic latent semantic analysis -- 17.5 Extension of PLSA and latent Dirichlet allocation -- 17.6 Evaluating topic analysis -- 17.7 Summary of topic models -- Bibliographic notes and further reading -- Exercises -- 18. Opinion mining and sentiment analysis -- 18.1 Sentiment classification -- 18.2 Ordinal regression -- 18.3 Latent aspect rating analysis -- 18.4 Evaluation of opinion mining and sentiment analysis -- Bibliographic notes and further reading -- Exercises -- 19. Joint analysis of text and structured data -- 19.1 Introduction-- 19.2 Contextual text mining -- 19.3 Contextual probabilistic latent semantic analysis -- 19.4 Topic analysis with social networks as context -- 19.5 Topic analysis with time series context -- 19.6 Summary -- Bibliographic notes and further reading -- Exercises --

Part IV. Unified text data management analysis system -- 20. Toward a unified system for text management and analysis -- 20.1 Text analysis operators -- 20.2 System architecture -- 20.3 MeTA as a unified system --

Summary The growth of "big data" created unprecedented opportunities to leverage computational and statistical approaches to turn raw data into actionable knowledge that can support various application tasks. This is especially true for the optimization of decision making in virtually all application domains such as health and medicine, security and safety, learning and education, scientific discovery, and business intelligence. Just as a microscope enables us to see things in the "micro world" and a telescope allows us to see things far away, one can imagine a "big data scope" would enable us to extend our perception ability to "see" useful hidden information and knowledge buried in the data, which can help make predictions and improve the optimality of a chosen decision. This book covers general computational techniques for managing and analyzing large amounts of text data that can help users manage and make use of text data in all kinds of applications

Bibliography Includes bibliographical references (pages 477-488) and index

Notes Description based on online resource; title from PDF title page (ACM, viewed July 26, 2016)

Subject Computational linguistics -- Statistical methods.

Data mining.

Natural language processing (Computer science)

Author Massung, Sean, author

ISBN 1970001178

1970001186

9781970001174

9781970001181

Permalink