Limit search to available items
Book Cover
E-book
Author Savoy, Jacques, 1958- author.

Title Machine learning methods for stylometry : authorship attribution and author profiling / Jacques Savoy
Published Cham, Switzerland : Springer, [2020]

Copies

Description 1 online resource (294 pages)
Contents Intro -- Preface -- Book Structure -- Hands-On Exercises and Examples -- Acknowledgements -- Contents -- Acronyms -- List of Symbols -- Part I Fundamental Concepts and Models -- 1 Introduction to Stylistic Models and Applications -- 1.1 Overview and Definitions -- 1.2 Style and Its Explaining Factors -- 1.3 Authorship Attribution -- 1.4 Author Profiling -- 1.5 Forensic Issues -- 1.6 Author Clustering -- 1.7 Other Related Problems -- 2 Basic Lexical Concepts and Measurements -- 2.1 Stylometric Model -- 2.2 Our Running Example: The Federalist Papers -- 2.3 The Zipf's Law
2.4 Vocabulary Richness Measures -- 2.5 Overall Stylistic Measures -- 2.6 And the Letters? -- 3 Distance-Based Approaches -- 3.1 Burrows' Delta -- 3.2 Kullback-Leibler Divergence Method -- 3.3 Labbé's Intertextual Distance -- 3.4 Other Distance Functions -- 3.5 Principal Component Analysis (PCA) -- Part II Advanced Models and Evaluation -- 4 Evaluation Methodology and Test Corpora -- 4.1 Preliminary Remarks -- 4.2 Text Quality and Preprocessing -- 4.3 Performance Measures -- 4.4 Precision, Recall, and F1 Measurements -- 4.5 Confidence Interval -- 4.6 Statistical Assessment
4.7 Training and Test Sample -- 4.8 Classical Problems -- 4.9 CLEF PAN Test Collections -- 4.10 Evaluation Examples -- 5 Features Identification and Selection -- 5.1 Word-Based Stylistic Features -- 5.2 Other Stylistic Feature Extraction Strategies -- 5.3 Frequency-Based Feature Selection -- 5.4 Filter-Based Feature Selection -- 5.5 Wrapper Feature Selection -- 5.6 Characteristic Vocabulary -- 6 Machine Learning Models -- 6.1 k-Nearest Neighbors (k-NN) -- 6.2 Naïve Bayes -- 6.3 Support Vector Machines (SVMs) -- 6.4 Logistic Regression -- 6.5 Examples with R -- 6.5.1 K-Nearest Neighbors (k-NN)
6.5.2 Naïve Bayes -- 6.5.3 Support Vector Machines (SVMs) -- 6.5.4 Logistic Regression -- 7 Advanced Models for Stylometric Applications -- 7.1 Zeta Method -- 7.2 Compression Methods -- 7.3 Latent Dirichlet Allocation (LDA) -- 7.4 Verification Problem -- 7.5 Collaborative Authorship -- 7.6 Neural Network and Authorship Attribution -- 7.7 Distributed Language Representation -- 7.8 Deep Learning and Long Short-Term Memory (LSTM) -- 7.9 Adversarial Stylometry and Obfuscation -- Part III Cases Studies -- 8 Elena Ferrante: A Case Study in Authorship Attribution -- 8.1 Corpus and Objectives
8.2 Stylistic Mapping of the Contemporary Italian Literature -- 8.3 Delta Model -- 8.4 Labbé's Intertextual Distance -- 8.5 Zeta Test -- 8.6 Qualitative Analysis -- 8.7 Conclusion -- 9 Author Profiling of Tweets -- 9.1 Corpus and Research Questions -- 9.2 Bots versus Humans -- 9.3 Man vs. Woman -- 9.4 Conclusion -- 10 Applications to Political Speeches -- 10.1 Corpus Selection and Description -- 10.2 Overall Measurements -- 10.3 Stylistic Similarities Between Presidencies -- 10.4 Characteristics Words and Sentences -- 10.5 Rhetoric and Style Analysis by Wordlists -- 10.6 Conclusion
Summary This book presents methods and approaches used to identify the true author of a doubtful document or text excerpt. It provides a broad introduction to all text categorization problems (like authorship attribution, psychological traits of the author, detecting fake news, etc.) grounded in stylistic features. Specifically, machine learning models as valuable tools for verifying hypotheses or revealing significant patterns hidden in datasets are presented in detail. Stylometry is a multi-disciplinary field combining linguistics with both statistics and computer science. The content is divided into three parts. The first, which consists of the first three chapters, offers a general introduction to stylometry, its potential applications and limitations. Further, it introduces the ongoing example used to illustrate the concepts discussed throughout the remainder of the book. The four chapters of the second part are more devoted to computer science with a focus on machine learning models. Their main aim is to explain machine learning models for solving stylometric problems. Several general strategies used to identify, extract, select, and represent stylistic markers are explained. As deep learning represents an active field of research, information on neural network models and word embeddings applied to stylometry is provided, as well as a general introduction to the deep learning approach to solving stylometric questions. In turn, the third part illustrates the application of the previously discussed approaches in real cases: an authorship attribution problem, seeking to discover the secret hand behind the nom de plume Elena Ferrante, an Italian writer known worldwide for her My Brilliant Friend's saga; author profiling in order to identify whether a set of tweets were generated by a bot or a human being and in this second case, whether it is a man or a woman; and an exploration of stylistic variations over time using US political speeches covering a period of ca. 230 years
Bibliography Includes bibliographical references and index
Notes Online resource; title from digital title page (viewed on December 01, 2020)
Subject Natural language processing (Computer science)
Computational linguistics.
Anonyms and pseudonyms -- Data processing
Machine learning.
Natural Language Processing
Machine Learning
computational linguistics.
Computational linguistics
Machine learning
Natural language processing (Computer science)
Form Electronic book
ISBN 3030533603
9783030533601