Unknown variance, \(\sigma^2\)
In most practical situations where normal distributions are used as models for data, the normal variance, \(\sigma^2\), is an unknown parameter.
Saturated fat content of cooking oil
Both cholesterol and saturated fats are often avoided by people who are trying to lose weight or reduce their blood cholesterol level. Cooking oil made from soybeans has little cholesterol and has been claimed to have only 15% saturated fat.
A clinician believes that the saturated fat content is greater than 15% and randomly samples 13 bottles of soybean cooking oil for testing.
15.2 12.4 |
15.4 13.5 |
15.9 17.1 |
16.9 14.3 |
19.1 18.2 |
15.5 16.3 |
20.0 |
We might model the data as a random sample from a normal distribution, and the hypotheses of interest relate to the mean, \(\mu\), of this distribution,
H0 : \(\mu = 15%\)
HA : \(\mu \gt 15%\)
However we no longer know the variance, \(\sigma^2\), of the distribution. The only information we have about \(\sigma^2\) comes from our sample.
When \(\sigma^2\) was a known value, we previously used a test statistic of the form
\[ Z \;\;=\;\; \frac{\overline{X} - \mu_0}{\ {\sigma}{\sqrt n}} \]where \(\mu_0 = 15\) and \(n=13\) in the context of this example. However \(Z\) cannot be used as a test statistic when \(\sigma^2\) is unknown since its value cannot be found from the data.
Test statistic
A test statistic with similar characteristics can be defined if \(\sigma\) is replaced by the sample standard deviation, \(S\).
\[ T \;\;=\;\; \frac{\overline{X} - \mu_0}{\ {S}{\sqrt n}} \]However although \(Z\) has a standard normal distribution, \(Z \sim \NormalDistn(0,1)\) when the null hypothesis is true, the test statistic \(T\) has a different distribution.
Distribution of test statistic T
If \(\overline{X}\) and \(S^2\) are the mean and variance of a random sample of size \(n\) from a \(\NormalDistn(\mu_0, \sigma^2)\) distribution,
\[ T \;\;=\;\; \frac{\overline{X} - \mu_0}{\diagfrac{S}{\sqrt{n}}} \;\;\sim\;\; \TDistn(n-1 \text{ df}) \]The proof is almost identical to the earlier proof that \(\large\frac{\overline{X} - \mu}{\diagfrac{S}{\sqrt{n}}}\) was a pivot for \(\mu\).
\[ T \;\;=\;\; \frac{\overline{X} - \mu_0}{S/\sqrt{n}} \;\;=\;\; \frac{\large\frac{\overline{X} - \mu_0}{\sigma/\sqrt{n}}}{\sqrt{S^2/\sigma^2}} \;\;=\;\; \frac{\large\frac{\overline{X} - \mu_0}{\sigma/\sqrt{n}}}{\sqrt{\large\frac{(n-1)S^2/\sigma^2}{n-1}}} \]The numerator has a standard normal distribution and \(\diagfrac{(n-1)S^2}{\sigma^2}\) has an independent Chi-squared distribution, so
\[ T \;\sim\; \frac{\NormalDistn(0,\;1)}{\sqrt{\large \frac {\ChiSqrDistn(n-1\text{ df})}{n-1}}} \;\;=\;\; \TDistn(n-1\text{ df}) \]T-test for \(\mu\)
The test statistic \(T\) is used in a similar way to the \(Z\) statistic on the previous page, but the p-value is obtained as a tail probability from a t distribution instead of a standard normal one. The method is illustrated in an example.
Saturated fat content of cooking oil
For the cooking oil data, the sample mean and standard deviation of the \(n=13\) values were \(\overline{x} = 16.138\) and \(s = 2.154\), so the test statistic for the t-test about whether \(\mu = 15\) is
\[ t \;\;=\;\; \frac{\overline{x} - 15}{\ {s}{\sqrt n}} \;\;=\;\; 1.906 \]Since the alternative hypothesis is for \(\mu \gt 15\), the p-value for the test is the upper tail area of the \(\TDistn(12 \text{ df})\) distribution,
p-value = \(P(T \ge 1.906) \;\;=\;\; 0.040\)
Since this is below 0.05, we conclude that there is moderately strong evidence that the mean saturated fat content of the oils is higher than the claimed 15%.
The diagram below illustrates the calculations.
Select Modified Data from the pop-up menu and use the slider to investigate how data with different means would lead to different p-values and conclusions from the test.