Unknown standard deviation

In the examples on the previous page, the population standard deviation, σ, was a known value. Unfortunately this is rarely the case in practice, so the previous test cannot be used.

Saturated fat content of cooking oil

Both cholesterol and saturated fats are often avoided by people who are trying to lose weight or reduce their blood cholesterol level. Cooking oil made from soybeans has little cholesterol and has been claimed to have only 15% saturated fat.

A clinician believes that the saturated fat content is greater than 15% and randomly samples 13 bottles of soybean cooking oil for testing.

Percentage saturated fat in soybean cooking oil
15.2
12.4
15.4
13.5
15.9
17.1
16.9
14.3
19.1
18.2
15.5
16.3
20.0

The hypotheses of interest are similar to those in the initial pages of this section,

H0 :   μ = 15%
HA :   μ > 15%

However we no longer know the population standard deviation, σ. The only information we have about σ comes from our sample.

Test statistic and its distribution

When the population standard deviation, σ, was a known value, we used a test statistic

which has a standard normal distribution when H0 was true.

When σ is unknown, we use a closely related test statistic that is also a 'statistical distance' between the sample mean and µ0,

where s is the sample standard deviation. This test statistic has greater spread than the standard normal distribution, due to the extra variability that results from estimating s, especially when the sample size n is small.

The diagram below generates random samples from a normal distribution. Click Take sample a few times to see the variability in the samples.

Click Accumulate then take about 50 random samples. Observe that the stacked dot plot of the t statistic conforms reasonably with a standard normal distribution.

Now use the pop-up menu to reduce the sample size to 5 and take a further 50-100 samples. You will probably notice that there are more 'extreme' t-values (less than -3 or more than +3) than would be expected from a standard normal distribution.

Reduce the sample size to 3 and repeat. It should now be clearer that the distribution of the t-statistic has greater spread than a standard normal distribution. Click on the crosses for the most extreme t-values and observe that they correspond to samples in which the 3 data values happen to be close together, resulting in a small sample standard deviation, s.

The t distribution

We have seen that the t statistic does not have a standard normal distribution, but it does have another standard distribution called a t distribution with (n - 1) degrees of freedom. In the next page, we will use this distribution to obtain the p-value for hypothesis tests.

The diagram below shows the shape of the t distribution for various different values of the degrees of freedom.

Drag the slider to see how the shape of the t distribution depends on the degrees of freedom. Note that


A standard normal distribution can be used as an approximation to a t distribution if the degrees of freedom are large (say 30 or more) but the t distribution must be used for smaller degrees of freedom.