Limit search to available items
Book Cover
E-book
Author Tanaka-Ishii, Kumiko.

Title Statistical universals of language : mathematical chance vs. human choice / Kumiko Tanaka-Ishii
Published Cham : Springer, 2021

Copies

Description 1 online resource (226 pages)
Series Mathematics in Mind
Mathematics in mind
Contents Intro -- Contents -- Part I Language as a Complex System -- 1 Introduction -- 1.1 Aims -- 1.2 Structure of This Book -- 1.3 Position of This Book -- 1.3.1 Statistical Universals as Computational Properties of Natural Language -- 1.3.2 A Holistic Approach to Language via Complex Systems Theory -- 1.4 Prospectus -- 2 Universals -- 2.1 Language Universals -- 2.2 Layers of Universals -- 2.3 Universal, Stylized Hypothesis, and Law -- 3 Language as a Complex System -- 3.1 Sequence and Corpus -- 3.1.1 Definition of Corpus -- 3.1.2 On Meaning -- 3.1.3 On Infinity -- 3.1.4 On Randomness
3.2 Power Functions -- 3.3 Scale-Free Property: Statistical Self-Similarity -- 3.4 Complex Systems -- 3.5 Two Basic Random Processes -- Part II Property of Population -- 4 Relation Between Rank and Frequency -- 4.1 Zipf's Law -- 4.2 Scale-Free Property and Hapax Legomena -- 4.3 Monkey Text -- 4.4 Power Law of n-grams -- 4.5 Relative Rank-Frequency Distribution -- 5 Bias in Rank-Frequency Relation -- 5.1 Literary Texts -- 5.2 Speech, Music, Programs, and More -- 5.3 Deviations from Power Law -- 5.3.1 Scale -- 5.3.2 Speaker Maturity -- 5.3.3 Characters vs. Words -- 5.4 Nature of Deviations
6 Related Statistical Universals -- 6.1 Density Function -- 6.2 Vocabulary Growth -- Part III Property of Sequences -- 7 Returns -- 7.1 Word Returns -- 7.2 Distribution of Return Interval Lengths -- 7.3 Exceedance Probability -- 7.4 Bias Underlying Return Intervals -- 7.5 Rare Words as a Set -- 7.6 Behavior of Rare Words -- 8 Long-Range Correlation -- 8.1 Long-Range Correlation Analysis -- 8.2 Mutual Information -- 8.3 Autocorrelation Function -- 8.4 Correlation of Word Intervals -- 8.5 Nonstationarity of Language -- 8.6 Weak Long-Range Correlation -- 9 Fluctuation -- 9.1 Fluctuation Analysis
9.2 Taylor Analysis -- 9.3 Differences Between the Two Fluctuation Analyses -- 9.4 Dimensions of Linguistic Fluctuation -- 9.5 Relations Among Methods -- 10 Complexity -- 10.1 Complexity of Sequence -- 10.2 Entropy Rate -- 10.3 Hilberg's Ansatz -- 10.4 Computing Entropy Rate of Human Language -- 10.5 Reconsidering the Question of Entropy Rate -- Part IV Relation to Linguistic Elements and Structure -- 11 Articulation of Elements -- 11.1 Harris's Hypothesis -- 11.2 Information-Theoretic Reformulation -- 11.3 Accuracy of Articulation by Harris's Scheme -- 12 Word Meaning and Value
12.1 Meaning as Use and Distributional Semantics -- 12.2 Weber-Fechner Law -- 12.3 Word Frequency and Familiarity -- 12.4 Vector Representation of Words -- 12.5 Compositionality of Meaning -- 12.6 Statistical Universals and Meaning -- 13 Size and Frequency -- 13.1 Zipf Abbreviation of Words -- 13.2 Compound Length and Frequency -- 14 Grammatical Structure and Long Memory -- 14.1 Simple Grammatical Framework -- 14.2 Phrase Structure Grammar -- 14.3 Long-Range Dependence in Sentences -- 14.4 Grammatical Structure and Long-Range Correlation -- 14.5 Nature of Long Memory Underlying Language
Summary This volume explores the universal mathematical properties underlying big language data and possible reasons why such properties exist, revealing how we may be unconsciously mathematical in our language use. These properties are statistical and thus different from linguistic universals that contribute to describing the variation of human languages, and they can only be identified over a large accumulation of usages. The book provides an overview of state-of-the art findings on these statistical universals and reconsiders the nature of language accordingly, with Zipf's law as a well-known example. The main focus of the book further lies in explaining the property of long memory, which was discovered and studied more recently by borrowing concepts from complex systems theory. The statistical universals not only possibly lie as the precursor of language system formation, but they also highlight the qualities of language that remain weak points in today's machine learning. In summary, this book provides an overview of language's global properties. It will be of interest to anyone engaged in fields related to language and computing or statistical analysis methods, with an emphasis on researchers and students in computational linguistics and natural language processing. While the book does apply mathematical concepts, all possible effort has been made to speak to a non-mathematical audience as well by communicating mathematical content intuitively, with concise examples taken from real texts
Notes Part V Mathematical Models
Bibliography Includes bibliographical references and index
Notes Print version record
Subject Mathematical linguistics.
Computational linguistics.
computational linguistics.
Lingüística matemática
Computational linguistics
Mathematical linguistics
Lingüística matemàtica.
Lingüística computacional.
Genre/Form Llibres electrònics.
Form Electronic book
ISBN 9783030593773
3030593770