How To Test If Data Is Normally Distributed

You can test the hypothesis that your data were sampled from a Normal (Gaussian) distribution visually (with QQ-plots and histograms) or statistically (with tests such as D'Agostino-Pearson and Kolmogorov-Smirnov). Nonetheless, it's rare to need to examination if your data are normal. About likely you're fitting some type of statistical model to your data such as ANOVA, linear regression, and nonlinear regression. In these cases, the supposition is that the residuals, the deviations between the model predictions and the observed information, are sampled from a unremarkably distribution. The residuals demand to be approximately ordinarily distributed to become valid statistical inference such as confidence intervals, coefficient estimates, and p values.

This means that the data don't necessarily need to be normally distributed, only the residuals do.

In this commodity, we volition take a deeper swoop into the subject field of normality testing, including:

Statistical test for normality with common statistical models
How to determine if data is ordinarily distributed using visual and statistical tests
Ordinarily distributed information examples
What to do if the residuals are non normal

How to test for normality with mutual statistical models

Linear and nonlinear regression

With uncomplicated linear regression, the residuals are the vertical altitude from the observed data to the line. In this case, the tests for normality should be performed on the residuals, not the raw data.

The same thought applies to nonlinear regression, where the model fits a curve instead of a straight line. The p-values and confidence intervals are based on the assumption that the residuals are ordinarily distributed.

Discover the easiest way to test your data using linear regression with a complimentary 30 24-hour interval trial of Prism.

Annotation the language. The shorthand (used higher up) is to test the assumption that the residuals are commonly distributed. What this actually means is testing the assumption that the residuals are sampled from a normal distribution, or are sampled from a population that follows a normal distribution.

T tests (paired and unpaired)

With t tests and ANOVA models, it appears a little different, simply it's really the aforementioned process of testing the model residuals.

With paired t tests, which are used when two measurements are taken on the same data point (for case, before and after measurements for each test subject), the model assumption is that the differences between the two measurements are unremarkably distributed. And so in that instance, simply test the difference for normality. A common fault is to test each group as being normally distributed.

With unpaired t tests, when comparison if the means between two different contained groups (such equally male vs female heights), both columns of data are assumed to exist normal, and both should be tested either individually or jointly if y'all assume equal variance and test the residuals, the divergence of each column value minus its respective estimated mean, not the raw data.

Are your residuals for t tests conspicuously deviating a little from normality? Note that t tests are robust to non-normal data with large sample sizes, pregnant that equally long as you have enough information, only substantial violations of normality need to exist addressed.

Perform a t examination in Prism today.

ANOVA with fixed furnishings

In two-way ANOVA with fixed effects, where there are 2 experimental factors such as fertilizer type and soil type, the assumption is that data inside each factor combination are normally distributed. Information technology's easiest to test this past looking at all of the residuals at once. In this case, the residuals are the departure of each observation from the group hateful of its respective cistron combination.

A common mistake is to exam for normality across only i factor. Using the fertilizer and soil blazon example, the assumption is that each group (fertilizer A with soil type 1, fertilizer A with soil type 2, …) is ordinarily distributed. It's not the aforementioned affair to test if fertilizer A information are normally distributed, and in fact, if the soil type is a significant factor, then they wouldn't be.

As long equally you're bold equal variance amongst the unlike handling groups, then yous can test for normality across all residuals at once. This is useful in cases when you lot accept only a few observations in whatever given factorial combination.

Examination the normality of your data earlier conducting an ANOVA in Prism.

How to test for normality

There are both visual and formal statistical tests that can assistance yous check if your model residuals meet the assumption of normality. In Prism, most models (ANOVA, Linear Regression, etc.) include tests and plots for evaluating normality, and y'all can also test a column of information straight.

Visually

Q-Q Plot

The most common graphical tool for assessing normality is the Q-Q plot. In these plots, the observed data is plotted against the expected quantiles of a normal distribution. It takes practice to read these plots. In theory, sampled data from a normal distribution would autumn forth the dotted line. In reality, even data sampled from a normal distribution, such equally the case QQ plot below, tin exhibit some difference from the line.

Frequency distribution

You may also visually check normality by plotting a frequency distribution, also called a histogram, of the data and visually comparing information technology to a normal distribution (overlaid in red). In a frequency distribution, each data point is put into a discrete bin, for example (-ten,-5], (-5, 0], (0, 5], etc. The plot shows the proportion of data points in each bin.

While this is a useful tool to visually summarize your data, a major drawback is that the bin size tin greatly touch on how the data look. The following histogram is the same data every bit to a higher place but using smaller bin sizes.

With statistical tests

At that place are many statistical tests to evaluate normality, although nosotros don't recommend relying on them blindly. Prism offers four normality test options: D'Agostino-Pearson, Anderson-Darling, Shapiro-Wilk and Kolmogorov-Smirnov. Each of the tests produces a p-value that tests the null hypothesis that the values (the sample) were sampled from a Normal (Gaussian) distribution (or population). :

If the p-value is not significant, the normality test was "passed". While it's true nosotros tin never say for sure that the data came from a normal distribution, there is not evidence to suggest otherwise.
If the p-value is significant, the normality exam was "failed". There is evidence that the information may not be normally distributed after all.

If that does non fit with your intuition, remember that the null hypothesis for these tests is that your sample came from a normally distributed population of information. So as with any significant test result, you lot are rejecting the idea that the information was normally distributed. Meet our guide for more specific information and groundwork on interpreting normality examination p-values.

Which is better: visual or statistical tests?

We recommend both. Information technology'southward always a good thought to plot your information, considering, while helpful, statistical tests take limitations. This is particularly true with medium to large sample sizes (over 70 observations), because in these cases, the normality tests can find very slight deviations from normality. Therefore, if your data "fail" a normality test, a visual check might tell you that fifty-fifty if the information are statistically non normal, they are practically normal.

Get started in Prism with your free 30 solar day trial today.

What if my residuals aren't normally distributed?

If there is evidence your data are significantly different from the expected normal distribution, what tin can yous do?

Some models are robust to deviations from normality

Depending on the model y'all are using, it may nevertheless provide accurate results despite some degree of not-normality. one-Mode ANOVA, for example, is frequently robust fifty-fifty if the information are not very close to normal.

Transformations

In some situations, you can transform your data and re-exam for normality. For case, log transformations are common, because lognormal distributions are mutual (especially in biology)

Non-Parametric Tests

If your data truly are not normal, many analyses take non-parametric alternatives, such equally the one-way ANOVA analog, Kruskal-Wallis, and the two-sample t test analog, Mann-Whitney. These methods don't rely on an assumption of normality. The downside is that they more often than not also have less power, then information technology's harder to notice statistical differences. Here are some recommendations to determine when to use nonparametric tests.