Limit search to available items
Did you mean Kinfolk Thickness? more »
8 results found. Sorted by relevance | date | title .
Book Cover
Book
Author Pearson, Ronald K., 1952-

Title Exploring data in engineering, the sciences and medicine / Ronald K. Pearson
Published New York ; Oxford : Oxford University Press, 2011

Copies

Location Call no. Vol. Availability
 W'PONDS  620.00285 Pea/Edi  AVAILABLE
 MELB  620.00285 Pea/Edi  AVAILABLE
Description xv, 770 pages : illustrations ; 24 cm
Contents Contents note continued: 1.6.3.A cautionary example: the killer potato -- 1.7.The four R's of exploratory data analysis -- 1.7.1.The first R: revelation -- 1.7.2.The second R: residuals -- 1.7.3.The third R: reexpression -- 1.7.4.The fourth R: resistance -- 1.8.Working with real datasets -- 1.8.1.A missing data example: the asteroid belt -- 1.8.2.A large dataset: chronic fatigue syndrome -- 1.9.Software considerations -- 1.10.Organization of the rest of this book -- 2.Data: Types, Uncertainty and Quality -- 2.1.The structure of datasets -- 2.2.Data types -- 2.3.Metadata -- 2.4.What can we measure and how well? -- 2.4.1.What can we measure? -- 2.4.2.Accuracy and precision -- 2.4.3.The limits of measurement quality -- 2.4.4.More typical measurements -- 2.5.Variations: normal and anomalous -- 2.5.1.Normal variations and "noise" -- 2.5.2.Outliers: gross errors and legitimate surprises -- 2.5.3.Inliers: a subtle data anomaly -- 2.6.Missing data --
Contents note continued: 10.1.2.Electoral versus popular vote margins -- 10.1.3.Association measures -- 10.1.4.Data limitations and anomalies -- 10.2.Joint and conditional distributions -- 10.2.1.Discrete events: the multinomial distribution -- 10.2.2.Multivariate distributions and densities -- 10.2.3.Statistical independence -- 10.2.4.Conditional probabilities -- 10.2.5.Conditional distributions and expectations -- 10.3.The multivariate Gaussian distribution -- 10.3.1.Vector formulation -- 10.3.2.Mahalanobis distances -- 10.3.3.The bivariate case -- 10.3.4.Quadrant probabilities -- 10.4.The product-moment correlation coefficient -- 10.4.1.Definition and estimation -- 10.4.2.Exact distribution for the Gaussian case -- 10.4.3.Fisher's transformation to normality -- 10.4.4.Testing for independence -- 10.4.5.The influence of outliers -- 10.4.6.The influence of transformations -- 10.5.The Spearman rank correlation coefficient -- 10.5.1.Definition, estimation, and properties --
Contents note continued: 10.5.2.The influence of outliers -- 10.5.3.An application of rank correlations -- 10.6.Mixture distributions -- 10.6.1.Discrete mixtures -- 10.6.2.Example: Gaussian mixtures -- 10.6.3.Continuous mixtures -- 10.6.4.Example 1: overdispersion -- 10.6.5.Example 2: ratios of random variables -- 10.6.6.Example 3: heavy-tailed distributions -- 10.7.Non-Gaussian multivariate distributions -- 10.7.1.Bivariate exponential distributions -- 10.7.2.Two surprising "near-Gaussian" examples -- 10.7.3.Elliptically distributed random variables -- 10.7.4.Copulas: building from marginals -- 10.7.5.Kendall's T -- 10.8.Relations among other variable types -- 10.8.1.Discrete, ordinal, and nominal variables -- 10.8.2.Mixed data types -- 10.8.3.The special case of binary variables -- 10.9.Exercises -- 11.Regression Models I: Real Data -- 11.1.Building regression models -- 11.1.1.Linear versus nonlinear regression -- 11.1.2.An example: the Riedel equation --
Contents note continued: 11.1.3.A second example: dimensional analysis -- 11.2.Ordinary least squares (OLS) -- 11.3.Two simple OLS extensions -- 11.3.1.Weighted least squares -- 11.3.2.Restricted least squares -- 11.4.M-estimators and robust regression -- 11.4.1.Basic notions of M-estimators -- 11.4.2.Machanics of M-estimators -- 11.4.3.Two illustrative examples -- 11.5.Other robust alternatives to OLS -- 11.6.Exercises -- 12.Reexpression: Data Transformations -- 12.1.Three uses for transformations -- 12.1.1.Changing visual emphasis -- 12.1.2.Linearizing nonlinear models -- 12.1.3.Changing data distributions -- 12.2.Four transformation horror stories -- 12.2.1.Making unrelated variables appear related -- 12.2.2.Transformations need not preserve curvature -- 12.2.3.Transformations need not preserve modality -- 12.2.4."Everything looks linear on a log-log plot" -- 12.3.Three popular transformations -- 12.3.1.Box-Cox transformations -- 12.3.2.Aranda-Ordaz transformations --
Contents note continued: 12.3.3.The angular transformation -- 12.4.Characteristics of good transformations -- 12.5.Generating nonuniform random numbers -- 12.5.1.Exponentially distributed random samples -- 12.5.2.Cauchy distributed random samples -- 12.5.3.Logistic distributed random samples -- 12.5.4.Pareto distributed random samples -- 12.5.5.Weibull distributed random samples -- 12.6.More general transformations -- 12.6.1.Transforming densities -- 12.6.2.Transformed exponential random variables -- 12.6.3.The Χ21 density -- 12.7.Reciprocal transformations -- 12.7.1.The Gaussian distribution -- 12.7.2.The Laplace distribution -- 12.7.3.The Cauchy distribution -- 12.7.4.The beta and Pareto distributions -- 12.7.5.The lognormal distribution -- 12.8.Exercises -- 13.Regression Models II: Mixed Data Types -- 13.1.Models with mixed data types -- 13.2.The influences of data type -- 13.2.1.Binary association: the odds ratio -- 13.2.2.Do big animals have big brains? --
Contents note continued: 13.3.ANOVA models -- 13.3.1.Analysis of variance (ANOVA) -- 13.3.2.Extensions and practical issues -- 13.3.3.Application: bitter pit in apples -- 13.4.Generalized linear models -- 13.5.Logistic regression models -- 13.5.1.Logistic regression -- 13.5.2.Ungrouped data -- 13.5.3.Application 1: bitter pit revisited -- 13.5.4.Application 2: missing data -- 13.5.5.Practical issues: EPV and separation -- 13.6.Poisson regression models -- 13.6.1.Over- and underdispersion -- 13.6.2.Poisson regression -- 13.6.3.Application: NPG in the Pima Indians dataset -- 13.7.Exercises -- 14.Characterizing Analysis Results -- 14.1.Analyzing modified datasets -- 14.1.1.Variation-based analysis procedures -- 14.1.2.The notion of exchangeability -- 14.2.Computational negative controls -- 14.2.1.Empirical probabilities and z-scores -- 14.2.2.An application: assessing correlations -- 14.3.Deletion diagnostics -- 14.3.1.The deletion diagnostic framework --
Contents note continued: 14.3.2.Simulation example: correlation analysis -- 14.3.3.Application to the brain/body dataset -- 14.4.Bootstrap resampling methods -- 14.4.1.The basic bootstrap formulation -- 14.4.2.Application to the brain/body dataset -- 14.5.Subsampling methods -- 14.5.1.Subsampling versus the bootstrap -- 14.5.2.Application 1: the simulation dataset -- 14.5.3.Application 2: the brain/body dataset -- 14.6.Applicability of these methods -- 14.7.Exercises -- 15.Regression Models III: Diagnostics and Refinements -- 15.1.The model-building process -- 15.1.1.What is a good data model? -- 15.1.2.The model development cycle -- 15.2.Three modeling examples -- 15.2.1.The brain/body dataset -- 15.2.2.Predicting triceps skinfold thickness -- 15.2.3.Bitter pit and mineral content -- 15.3.Assessing goodness-of-fit -- 15.3.1.The classical R2 measure and F-statistics -- 15.3.2.Robust goodness-of-fit measures -- 15.4.Initial variable selection --
Contents note continued: 15.4.1.Statistical significance versus purposeful selection -- 15.4.2.Considering variable transformations -- 15.5.Deciding which variables to keep -- 15.6.The problem of collinearity -- 15.6.1.Collinearity and OLS parameter estimates -- 15.6.2.Dealing with collinearity -- 15.7.Finding influential data observations -- 15.7.1.Examining model residuals -- 15.7.2.Leverage and the hat matrix -- 15.7.3.OLS regression diagnostics -- 15.8.Cross-validation -- 15.8.1.Leave-one-out cross-validation -- 15.8.2.K-fold cross-validation -- 15.8.3.Cross-validation for variable selection -- 15.9.Iterative refinement strategies -- 15.9.1.All subsets regression -- 15.9.2.Stepwise regression -- 15.9.3.Forward selection for the TSF model -- 15.9.4.An alternative TSF model -- 15.9.5.Forward selection for the bitter pit model -- 15.10.Exercises -- 16.Dealing with Missing Data -- 16.1.The missing data problem -- 16.1.1.General missing data strategies --
Contents note continued: 16.1.2.Missingness: MCAR, MAR, and MNAR -- 16.2.The univariate case -- 16.2.1.Univariate issues and strategies -- 16.2.2.Four simulation examples -- 16.2.3.Location estimates -- 16.2.4.Scale estimates -- 16.3.Four multivariate examples -- 16.4.Case deletion strategies -- 16.4.1.Complete case versus available case analysis -- 16.4.2.Omitting variables and related ideas -- 16.4.3.Results for the simulation examples -- 16.5.Simple imputation strategies -- 16.5.1.Mean imputation -- 16.5.2.Hot-deck imputation -- 16.5.3.Regression-based imputation -- 16.5.4.Hot-deck/regression composite method -- 16.6.Multiple imputation -- 16.7.The EM algorithm -- 16.7.1.General description -- 16.7.2.A simple univariate example -- 16.7.3.A nonignorable univariate example -- 16.7.4.Specialization to bivariate Gaussian data -- 16.7.5.Application to the simulation examples -- 16.7.6.The Healy-Westmacott regression procedure -- 16.8.Results for the Pima Indians dataset --
Contents note continued: 16.9.General conclusions -- 16.10.Exercises
Contents note continued: 2.6.1.The problem of coding missing data -- 2.6.2.Disguised missing data -- 2.6.3.Causes of missing data -- 2.6.4.Ignorable versus nonignorable missing data -- 2.7.Other data anomalies -- 2.7.1.Coarse quantization -- 2.7.2.Noninformative variables -- 2.7.3.File merge and manipulation errors -- 2.7.4.Duplicate records -- 2.7.5.Categorical data errors -- 2.8.A few concluding observations -- 3.Characterizing Categorical Variables -- 3.1.Three categorical data examples -- 3.1.1.The UCI mushroom dataset -- 3.1.2.Who wrote the Federalist Papers? -- 3.1.3.Horse-kick deaths in the Prussian army -- 3.2.Discrete random variables -- 3.2.1.The discrete random variable model -- 3.2.2.Events and probabilities -- 3.3.Three important distributions -- 3.3.1.Urn models and the binomial distribution -- 3.3.2.The hypergeometric distribution -- 3.3.3.The discrete uniform distribution -- 3.4.Entropy -- 3.5.Interestingness and heterogeneity --
Contents note continued: 3.5.1.Four heterogeneity measures -- 3.5.2.Application to the UCI mushroom dataset -- 3.6.Count distributions -- 3.6.1.The Poisson distribution -- 3.6.2.The negative binomial distribution -- 3.6.3.Zero-inflated count models -- 3.7.The Zipf distribution -- 3.7.1.Definition and properties -- 3.7.2.Examples and consequences -- 3.8.Exercises -- 4.Uncertainty in Real Variables -- 4.1.Continuous random variables -- 4.1.1.Distributions and densities -- 4.1.2.Location parameters: mean, median, and mode -- 4.1.3.Expected values and moments -- 4.2.How are data values distributed? -- 4.2.1.The normal (Gaussian) distribution -- 4.2.2.Clancey's survey of data distributions -- 4.3.Moment characterizations -- 4.3.1.The Markov and Chebyshev inequalities -- 4.3.2.Skewness and kurtosis -- 4.3.3.The method of moments -- 4.3.4.Karl Pearson's 1895 system of distributions -- 4.3.5.Johnson's system of distributions -- 4.4.Limitations of moment characterizations --
Contents note continued: 4.4.1.Exact characterizations -- 4.4.2.Approximate characterizations -- 4.5.Some important distributions -- 4.5.1.The beta distribution -- 4.5.2.The Cauchy distribution -- 4.5.3.The exponential distribution -- 4.5.4.The gamma distribution -- 4.5.5.The Laplace distribution -- 4.5.6.The logistic distribution -- 4.5.7.The lognormal distribution -- 4.5.8.The Pareto distribution -- 4.5.9.The Rayleigh distribution -- 4.5.10.The Weibull distribution -- 4.6.Exercises -- 5.Fitting Straight Lines -- 5.1.Why do we fit straight lines? -- 5.1.1.Linear constitutive relations -- 5.1.2.Taylor series expansions -- 5.1.3.Allometry -- 5.1.4.Behavior and functional equations -- 5.2.Do we fit y on x or x on y? -- 5.3.Three approaches to fitting lines -- 5.3.1.Optimization-based problem formulations -- 5.3.2.The ordinary least squares (OLS) fit -- 5.3.3.The least absolute deviations (LAD) fit -- 5.3.4.The total least squares (TLS) fit -- 5.4.The method of maximum likelihood --
Contents note continued: 5.4.1.The basic concept -- 5.4.2.Three specific maximum likelihood solutions -- 5.5.Two brief case studies -- 5.5.1.Case study 1: L1 vs. L2 vs. L[∞] -- 5.5.2.Case study 2: OLS vs. TLS -- 5.6.The unknown-but-bounded formulation -- 5.7.Which method do we use? -- 5.8.Exercises -- 6.A Brief Introduction to Estimation Theory -- 6.1.Characterizing estimators -- 6.1.1.Location estimators -- 6.1.2.Estimator bias -- 6.1.3.Variance and consistency -- 6.1.4.Other characterizations -- 6.2.An example: variance estimation -- 6.2.1.The standard estimators χN and σ2 -- 6.2.2.Exact distribution for the Gaussian case -- 6.2.3.What about non-Gaussian cases? -- 6.3.The CLT and asymptotic normality -- 6.3.1.Distributions of sums and averages -- 6.3.2.The Central Limit Theorem -- 6.3.3.Asymptotic normality and relative efficiency -- 6.4.Cases where the CLT does not apply -- 6.4.1.Stable random variables -- 6.4.2.Weighted averages --
Contents note continued: 6.4.3.Webster's ambient noise statistics -- 6.5.The information inequality -- 6.6.Order statistics and L-estimators -- 6.6.1.Characterizing the Cauchy distribution -- 6.6.2.Distributions of order statistics -- 6.6.3.Uniform order statistics -- 6.6.4.A maximum likelihood estimation problem -- 6.6.5.L-estimators and their properties -- 6.6.6.L-estimators for Cauchy parameters -- 6.6.7.Gastwirth's location estimator -- 6.6.8.Asymptotic normality of L-estimators -- 6.6.9.Gini's mean difference -- 6.6.10.Uniform maximum likelihood estimators -- 6.7.Exercises -- 7.Outliers: Distributional Monsters (?) That Lurk in Data -- 7.1.Outliers and their consequences -- 7.1.1.The outlier sensitivity of moments -- 7.1.2.Failure of the 3σ-edit rule -- 7.1.3.The contaminated normal outlier model -- 7.2.Four ways of dealing with outliers -- 7.2.1.Detect and omit -- 7.2.2.Detect and replace -- 7.2.3.Detect and scrutinize --
Contents note continued: 7.2.4.Use outlier-resistant analytical procedures -- 7.3.Robust estimators -- 7.3.1.The breakdown point: a measure of resistance -- 7.3.2.The influence function: a measure of smoothness -- 7.3.3.Efficiency robustness: a measure of breadth -- 7.3.4.A comparison of the mean and the median -- 7.4.Robust alternatives to χN and σ -- 7.4.1.The Princeton robustness study -- 7.4.2.The MADM scale estimate -- 7.4.3.Robustness of the MADM scale estimate -- 7.5.Outlier detection -- 7.5.1.The Hampel identifier -- 7.5.2.Masking and swamping breakdown points -- 7.5.3.Practical details in outlier detection -- 7.6.The problem of asymmetry -- 7.6.1.Robust asymmetry measures -- 7.6.2.Location-free scale estimates -- 7.7.Other practical considerations -- 7.7.1.Light-tailed and bimodal distributions -- 7.7.2.Discrete distributions (quantization) -- 7.7.3.Discontinuity---a cautionary tale -- 7.8.General recommendations -- 7.8.1.How robust is enough? --
Contents note continued: 7.8.2.Overall recommendations -- 7.9.Exercises -- 8.Characterizing a Dataset -- 8.1.Surveying and appreciating a dataset -- 8.2.Three useful visualization tools -- 8.2.1.The normal Q-Q plot -- 8.2.2.The Poissonness plot -- 8.2.3.Nonparametric density estimators -- 8.3.Quantile-quantile plots -- 8.3.1.The basic idea -- 8.3.2.The general construction -- 8.3.3.Normal Q-Q plots -- 8.3.4.Data comparison plots -- 8.4.Plots for discrete distributions -- 8.4.1.Poissonness plots -- 8.4.2.Negative binomialness plots -- 8.5.Histograms: crude density estimates -- 8.5.1.The basic histogram -- 8.5.2.Histogram bias -- 8.5.3.Histogram variance -- 8.6.Kernel density estimators -- 8.6.1.The basic kernel estimator -- 8.6.2.Bias in kernel estimates -- 8.6.3.Variance of kernel estimates -- 8.6.4.A comparison of four examples -- 8.7.Scatterplot smoothers -- 8.7.1.Supsmu: an adaptive smoother -- 8.7.2.The lowess smoother -- 8.8.The preliminary data survey -- 8.9.Exercises --
Contents note continued: 9.Confidence Intervals and Hypothesis Testing -- 9.1.Confidence intervals -- 9.1.1.Application: systematic errors -- 9.1.2.The case of unknown variance -- 9.2.Extensions of the Poissonness plot -- 9.3.Formal hypothesis tests -- 9.4.Comparing means -- 9.4.1.The classical t-test -- 9.4.2.Limitations of the t-test -- 9.4.3.The Wilcoxon rank-sum test -- 9.4.4.The Yuen-Welch and Welch rank-based tests -- 9.5.The Χ2 distribution and Χ2 tests -- 9.5.1.The Χ2 distribution -- 9.5.2.The Χ2 test -- 9.5.3.An application: uniformity testing -- 9.6.The F-test -- 9.7.Binomial random variables -- 9.8.Testing multiple hypotheses -- 9.8.1.The multiple comparison problem -- 9.8.2.The Bonferroni correction -- 9.8.3.The Holm stepdown procedure -- 9.8.4.The Benjamani-Hochberg procedure -- 9.9.Exercises -- 10.Relations among Variables -- 10.1.What is the relationship between popular and electoral votes? -- 10.1.1.Analysis of the World Almanac data --
Machine generated contents note: 1.The Art of Analyzing Data -- 1.1.What this book is about -- 1.1.1.Useful data characterizations -- 1.1.2.Ohm's law -- 1.2.How much can we learn from data? -- 1.2.1.Can one hear the shape of a drum? -- 1.2.2.The role of assumptions -- 1.3.Numerical mathematics versus data analysis -- 1.3.1.Numbers, arithmetic, and roundoff errors -- 1.3.2.Computing mathematical functions -- 1.3.3.Data analysis and uncertainty -- 1.4.Dealing with uncertain data -- 1.4.1.Additive uncertainty models -- 1.4.2.The minimum uncertainty model -- 1.4.3.The random variable model -- 1.4.4.Other uncertainty models -- 1.5.What is a good data model? -- 1.5.1.Empirical versus fundamental models -- 1.5.2.The principle of zig-zag-and-swirl -- 1.5.3.Ockham's razor and overfitting -- 1.5.4.Wei's elephant and Einstein's advice -- 1.6.Exploratory versus confirmatory analysis -- 1.6.1.Confirmatory data analysis -- 1.6.2.Exploratory data analysis --
Notes Formerly CIP. Uk
Bibliography Includes bibliographical references and index
Subject Engineering -- Data processing.
Engineering -- Statistical methods.
Mathematical statistics -- Textbooks.
Genre/Form Textbooks.
LC no. 2010031400
ISBN 0195089650 (hbk.)
9780195089653 (hbk.)