Pivot for a normal sample's variance

Data sets are often modelled as random samples from normal distributions. We will now assume that \(\{X_1, X_2, \dots, X_n\}\) is a random sample from a \(\NormalDistn(\mu,\;\sigma^2)\) distribution but where both \(\mu\) and \(\sigma^2\) are unknown parameters.

To find a confidence interval for \(\sigma^2\), we will use a pivot — a function of the data and \(\sigma^2\) whose distribution is completely known with no unknown parameters. A suitable pivot is

\[ g(\sigma^2, X_1, \dots, X_n) \;\;=\;\; \frac{n-1}{\sigma^2} S^2 \;\;\sim\;\;\ChiSqrDistn(n-1\;\text{df}) \]

where \(S^2\) is the sample variance.

This pivot can be used to find a 95% confidence interval for \(\sigma^2\).

Confidence interval for a normal distribution's variance

If \(s^2\) is the variance of a random sample from a \(\NormalDistn(\mu,\;\sigma^2)\) distribution, a 95% confidence interval for \(\sigma^2\) is

\[ \frac{(n-1)s^2}{\chi_{n-1,\;0.975}^2} \;\;\lt\;\; \sigma^2 \;\;\lt\;\; \frac{(n-1)s^2}{\chi_{n-1,\;0.025}^2} \]

where \(\chi_{n-1,\;0.025}^2\) and \(\chi_{n-1,\;0.975}^2\) are the 2½th and 97½th percentiles of the Chi-squared distribution with \((n-1)\) degrees of freedom.

The method for finding the confidence interval follows the general pivotal method described earlier. Since

\[ P\left(\chi_{n-1,\;0.025}^2 \;\lt\; \frac{n-1}{\sigma^2} S^2 \;\lt\; \chi_{n-1,\;0.975}^2\right) \;\;=\;\; 0.95 \]

a 95% confidence interval is found by rearranging the two inequalities in

\[ \chi_{n-1,\;0.025}^2 \;\lt\; \frac{n-1}{\sigma^2} s^2 \;\lt\; \chi_{n-1,\;0.975}^2 \]

Rearranging the leftmost inequality gives

\[ \sigma^2 \;\;\lt\;\; \frac{(n-1)s^2}{\chi_{n-1,\;0.025}^2} \]

and rearranging the right-hand one gives

\[ \sigma^2 \;\;\gt\;\; \frac{(n-1)s^2}{\chi_{n-1,\;0.975}^2} \]

This can be generalised in an obvious way to find interval estimates with other confidence levels by replacing \(\chi_{n-1,\;0.025}^2\) and \(\chi_{n-1,\;0.975}^2\) with other quantiles from the Chi-squared distribution. For example, to find a 90% confidence interval, \(\chi_{n-1,\;0.05}^2\) and \(\chi_{n-1,\;0.95}^2\) would be used.

Quantiles of Chi-squared distributions

There are no explicit formulae for the cumulative probabilities or quantiles of the family of Chi-squared distributions, but computer software can evaluate these for you. The example on the previous page showed how to find a cumulative probability using Excel. Type into a spreadsheet cell

=CHISQ.INV(\(\langle p \rangle\), \(\langle df \rangle\))

when looking for the \(\langle p \rangle\)th quantile of the Chi-squared distribution with \(\langle df \rangle\) degrees of freedom.

Example

In an experiment that investigated the grazing behaviour of dairy cows, four cows were studied while they grazed on 48 different plots of grass. The grass intake was estimated in each plot by sampling before and after the experiment, and the number of bites made by each cow was recorded. This table gives the grass intake per bite in each of the plots.

1.09 1.41 1.20 1.04 1.07 1.39 1.06 1.14
0.88 0.92 1.07 1.07 1.18 0.57 0.01 0.31
1.14 1.18 0.58 0.74 0.14 0.48 0.91 0.37
2.19 1.17 2.34 1.69 1.97 1.04 1.76 1.26
1.62 0.81 1.81 2.06 2.27 1.24 0.02 1.46
2.29 2.28 1.40 0.60 1.41 0.49 1.06 1.58

There are only 48 observations, so it is impossible to be sure of the shape of the underlying population distribution. However a histogram does seem reasonably symmetrical, so a normal distribution is a reasonable model.

Assuming that the data come from a \(\NormalDistn(\mu,\;\sigma^2)\) distribution, find a 95% confidence interval for the standard deviation, \(\sigma\).

The sample variance is \(s^2 = 0.3606\) and this has \(n - 1 = 47\) degrees of freedom.

The 2½th and 97½th percentiles of the \(\ChiSqrDistn(47\;\text{df})\) distribution can be found from Excel to be 29.96 and 67.82. A 95% confidence interval for the normal variance \(\sigma^2\) is therefore

\[ \frac{47 \times 0.3606}{67.82} \;\;\lt\;\; \sigma^2 \;\;\lt\;\; \frac{47 \times 0.3606}{29.96} \] \[ 0.2499 \;\;\lt\;\; \sigma^2 \;\;\lt\;\; 0.5658 \]

A 95% CI for the standard deviation is found by rewriting the inequalities

\[ \sqrt{0.2499} \;\;\lt\;\; \sigma \;\;\lt\;\; \sqrt{0.5658} \] \[ 0.500 \;\;\lt\;\; \sigma \;\;\lt\;\; 0.752 \]