Applying the general properties of p-values to different tests

The properties of p-values (and hence their interpretation) have been demonstrated in the context of a hypothesis test about whether a population mean was zero.

P-values for all hypothesis tests have the same properties. As a result, we can interpret any p-value if we know the null and alternative hypotheses that it tests, even if we do not know the formulae that underlies it. (In practice, a statistical computer program is generally used to perform hypothesis tests, so knowledge of formulae is of little importance.)

In particular, for any test where the null hypothesis restricts a parameter to a single value,


p-value Interpretation
over 0.1 no evidence that the null hypothesis does not hold
between 0.05 and 0.1 very weak evidence that the null hypothesis does not hold
between 0.01 and 0.05   moderately strong evidence that the null hypothesis does not hold
under 0.01 strong evidence that the null hypothesis does not hold

Another type of test

The normal distribution is often used as a hypothetical population from which a set of data are assumed to be sampled. But are the data consistent with an underlying normal population, or does the population distribution have a different shape?

One popular test for assessing whether a random sample come from a normal population is the Shapiro-Wilkes W test. The theory behind the test is advanced and the formula for the p-value cannot be readily evaluated by hand. However most statistical programs will perform the test.

A random sample of 40 values from a normal population is displayed in a jittered dot plot on the left of the diagram. The p-value for the Shapiro-Wilkes W test is shown under the dot plot and also graphically on the right.

Click Take sample a few times to take more samples and build the distribution of the p-values for the test. You should observe that the p-values have a rectangular distribution between 0 and 1 when the null hypothesis is true (i.e. if the samples are from a normal distribution).

Drag the slider on the top left of the diagram to change the shape of the population distribution. Repeat the exercise above and observe that when the null hypothesis does not hold, the p-values tend to be closer to 0.

Click on crosses on the display of p-values in the bottom right to display the sample that produced that p-value. P-values near zero usually correspond to samples that have very long tails to one or both sides, or have very short tails to one or both sides.

Returns from Mutual Funds

As a numerical example, the table below gives the annual returns for 159 mutual funds specialising in technology stocks in the USA, as reported in 2014. (The returns were sorted by column in order of their 3-month returns at the time.)

Percentage return
10.32
10.05
9.79
9.78
6.13
6.04
5.94
5.77
9.12
4.39
4.36
4.33
4.32
4.28
4.23
4.24
4.16
1.33
2.35
3.58
3.53
3.56
3.50
1.08
3.49
3.30
2.59
3.19
3.14
3.70
3.56
3.49
-2.23
3.34
3.24
-2.30
2.56
-0.55
-0.66
2.33
-0.90
-2.29
-2.48
-2.70
-2.44
-2.52
-2.58
-2.57
0.31
-2.65
-2.66
0.31
0.25
0.19
-0.07
-1.84
-1.84
-1.92
-1.97
-1.01
-2.16
-1.01
-2.16
-1.03
-0.89
-1.12
-1.34
-1.33
-1.08
2.14
2.10
-2.00
-2.16
-2.34
-2.20
-2.25
-2.28
-2.34
-1.56
-2.40
-4.35
-1.78
-1.70
-4.44
-1.86
-1.52
-1.51
-1.80
-1.77
-1.87
-1.88
-1.62
-4.63
-2.09
-1.85
-1.86
-2.43
-2.51
-1.53
-2.54
-2.09
-1.85
-1.86
-2.43
-2.44
-2.51
-1.53
-1.54
-2.54
-1.51
-1.85
-1.95
-1.95
-2.77
-1.96
-1.81
-1.82
-2.89
-3.05
-3.05
-3.11
-3.16
-2.21
-3.27
-3.31
-7.03
-3.97
-4.05
-3.86
-4.44
-4.42
-4.56
-4.78
-4.78
-6.82
-4.84
-4.91
-5.17
-5.41
-5.40
-5.49
-5.60
-5.71
-5.73
-7.85
-7.93
-8.03
-8.16
-6.92
-7.04
-9.13
-9.18
-9.35
-9.43
-11.61
0.00
0.00
-17.50
-17.60
 

A histogram of the data is shown below.

The best-fitting normal distribution (with mean and standard deviation equal to those of the data) has been superimposed on the histogram. There are two outliers with very negative returns (between -17% and -18%) and there seem more funds with between -4% and 0% return than would be expected from a normal distribution. Could this just be the result of randomness in samples from a normal distribution?

Applying the Shapiro-Wilkes test to the data using the statistical program Minitab gives a p-value of "under 0.005". We conclude that there is strong evidence that the distribution is not normal. After deleting the two low 'outliers' — two DoubleLine Equities Technology funds, the p-value for the test is still reported as "under 0.01" to there is still strong evidence that the distribution is not normal.

You should be able to interpret p-values that computer software provides for a wide variety of hypothesis tests using the properties that we have described in this section.