Limit search to available items
Book Cover
E-book
Author Zhu, Mu, author.

Title Essential statistics for data science : a concise crash course / Mu Zhu
Published Oxford, United Kingdom ; New York, NY : Oxford University Press, [2023]
©2023

Copies

Description 1 online resource (177 pages)
Contents Cover -- titlepage -- copyright -- dedication -- Contents -- Prologue -- Part I. Talking Probability -- 1 Eminence of Models -- Appendix 1.A For brave eyes only -- 2 Building Vocabulary -- 2.1 Probability -- 2.1.1 Basic rules -- 2.2 Conditional probability -- 2.2.1 Independence -- 2.2.2 Law of total probability -- 2.2.3 Bayes law -- 2.3 Random variables -- 2.3.1 Summation and integration -- 2.3.2 Expectations and variances -- 2.3.3 Two simple distributions -- 2.4 The bell curve -- 3 Gaining Fluency -- 3.1 Multiple random quantities -- 3.1.1 Higher-dimensional problems -- 3.2 Two 'hard' problems -- 3.2.1 Functions of random variables -- 3.2.2 Compound distributions -- Appendix 3.A Sums of independent random variables -- 3.A.1 Convolutions -- 3.A.2 Moment-generating functions -- 3.A.3 Formulae for expectations and variances -- Part II. doing statistics -- 4 Overview of Statistics -- 4.1 Frequentist approach -- 4.1.1 Functions of random variables -- 4.2 Bayesian approach -- 4.2.1 Compound distributions -- 4.3 Two more distributions -- 4.3.1 Poisson distribution -- 4.3.2 Gamma distribution -- Appendix 4.A Expectation and variance of the Poisson -- Appendix 4.B Waiting time in Poisson process -- 5 Frequentist Approach -- 5.1 Maximum likelihood estimation -- 5.1.1 Random variables that are i.i.d. -- 5.1.2 Problems with covariates -- 5.2 Statistical properties of estimators -- 5.3 Some advanced techniques -- 5.3.1 EM algorithm -- 5.3.2 Latent variables -- Appendix 5.A Finite mixture models -- 6 Bayesian Approach -- 6.1 Basics -- 6.2 Empirical Bayes -- 6.3 Hierarchical Bayes -- Appendix 6.A General sampling algorithms -- 6.A.1 Metropolis algorithm -- 6.A.2 Some theory -- 6.A.3 Metropolis-Hastings algorithm -- Part III. Facing uncertainty -- 7 Interval Estimation -- 7.1 Uncertainty quantification -- 7.1.1 Bayesian version -- 7.1.2 Frequentist version -- 7.2 Main difficulty -- 7.3 Two useful methods -- 7.3.1 Likelihood ratio -- 7.3.2 Bootstrap -- 8 Tests of Significance -- 8.1 Basics -- 8.1.1 Relation to interval estimation -- 8.1.2 The p-value -- 8.2 Some challenges -- 8.2.1 Multiple testing -- 8.2.2 Six degrees of separation -- Appendix 8.A Intuition of Benjamini-Hockberg -- Part IV. APPENDIX -- Appendix: Some Further Topics -- A.1 Graphical models -- A.2 Regression models -- A.3 Data collection -- Epilogue -- Bibliography -- Index
Summary "Essential Statistics for Data Science is a very short crash course for students entering a serious graduate program in data science without knowing enough statistics. However, it is not the type of introductory course that simply teaches students how to plug numbers into a formula and perform a t-test. While the course does start from the basics of probability and random variables, it moves along rapidly and ambitiously takes students in a matter of weeks to a number of relatively advanced topics in both frequentist and Bayesian inference as well as uncertainty assessment—such as the EM algorithm, the Gibbs sampler, and the bootstrap. The “main plot” unfolds in three parts. Part I, Talking Probability: The statistical approach to analysing data begins with a probability model to describe the data generating process; that's why, to study statistics, one must first learn to speak the language of probability. Part II, Doing Statistics: Before a model becomes truly useful, one must learn something about the unknown quantities in it—e.g., its parameters—from the data it is presumed to have generated, whether one cares about the parameters themselves or not; that's what much of statistical inference is about. Part III, Facing Uncertainty: Although one usually does not care much about parameters that don't have intrinsic scientific meaning, for those that do, it is important to explicitly describe how much uncertainty we have about them and take that into account when making decisions"--Publisher's description
Bibliography Includes bibliographical references and index
Notes Description based on online resource; title from home page (Oxford Academic, viewed on March 15, 2024)
Subject Big data -- Statistical methods
Data mining -- Statistical methods.
Data mining -- Statistical methods
Mathematics.
Mathematics.
Form Electronic book
ISBN 019269359X
9780191959844
0191959847
9780192693594