TY - JOUR
T1 - Violating the normality assumption may be the lesser of two evils
JF - bioRxiv
DO - 10.1101/498931
SP - 498931
AU - Knief, Ulrich
AU - Forstmeier, Wolfgang
Y1 - 2020/01/01
UR - http://biorxiv.org/content/early/2020/05/05/498931.abstract
N2 - When data are not normally distributed (e.g. skewed, zero-inflated, binomial, or count data) researchers are often uncertain whether it may be legitimate to use tests that assume Gaussian errors (e.g. regression, t-test, ANOVA, Gaussian mixed models), or whether one has to either model a more specific error structure or use randomization techniques.Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.We find that Gaussian models are remarkably robust to non-normality over a wide range of conditions, meaning that P-values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also perform well in terms of power and they can be useful for parameter estimation but usually not for extrapolation. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data.Overall, we argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and difficult to check during peer review. Hence, as long as scientists and reviewers are not fully aware of the risks, science might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data in a transparent way.Tweetable abstract Gaussian models are remarkably robust to even dramatic violations of the normality assumption.Competing Interest StatementThe authors have declared no competing interest.
ER -