Interpretation of p-values

Many different types of hypothesis test are commonly used in advanced statistics, but all share common features.

A p-value is a statistic that is evaluated from a random sample, so it has a distribution in the same way that a sample mean has a distribution. This distribution also has features that are common to all hypothesis tests. Understanding the distribution of p-values is the key to understanding how they are interpreted.

Distribution of p-values

In any hypothesis test,

The diagram below shows typical distributions that might be obtained.

To illustrate these properties, we use a test for whether a population mean is zero.

null hypothesis   H0 :   µ  =  0
alternative hypothesis   HA :   µ  ≠  0

In the diagram below, you will take random samples from a normal population for which H0 is true and, separately, from populations for which HA is true.

When H0 holds

Initially the population mean is zero, so H0 holds. A single sample from this population is shown on the left and the p-value for testing whether the population mean is zero is shown as a cross on the jittered dot plot on the bottom right.

Click the button Take sample a few times to take other samples from this population and add their p-values to the display on the bottom right. After taking 50 or more samples, you should observe that the p-values are spread evenly between 0 and 1. This supports our assertion that the p-values have a rectangular distribution between 0 and 1 when H0 holds.

When HA holds

Now use the slider to change the true population mean to 2.0. We are still testing whether the mean is zero, so HA now holds. Take 40 or 50 samples and observe that the p-values are usually closer to 0 than to 1.

Click on some of the larger p-values on the jittered dot plot to display the samples that gave rise to them. The sample means vary and, by chance, some samples have means that are near 0.0, even when the population mean is 2.0; these samples result in larger p-values.

Repeat this exercise with different population means (try at least 1.0, 2.0, 3.0 and -2.0). The further the population mean from the value targetted by H0, 0.0, the more tightly clustered the p-values are around 0.0.

Although it is possible to obtain a low p-value when H0 holds and a high p-value when HA holds, low p-values are more likely under HA than under H0.