Sample variances
One application of the F distribution arises when comparing two random samples from normal distributions. If the sample variance of a random sample of \(n_1\) values from a \(\NormalDistn(\mu_1,\;\sigma_1^2)\) distribution is \(S_1^2\) and the corresponding sample variance of an independent random sample of \(n_2\) values from a \(\NormalDistn(\mu_2,\;\sigma_2^2)\) distribution is \(S_2^2\), then we know that
\[ \begin{align} \frac {n_1-1}{\sigma_1^2} S_1^2 \;&\sim\; \ChiSqrDistn(n_1 - 1\;\text{df}) \\[0.3em] \frac {n_2-1}{\sigma_2^2} S_2^2 \;&\sim\; \ChiSqrDistn(n_2 - 1\;\text{df}) \end{align} \]This can also be expressed as
\[ \begin{align} \frac {S_1^2}{\sigma_1^2} \;&\sim\; \frac{\ChiSqrDistn(n_1 - 1\;\text{df})}{n_1 - 1}\\[0.5em] \frac{S_2^2}{\sigma_2^2} \;&\sim\; \frac{\ChiSqrDistn(n_2 - 1\;\text{df})}{n_2 - 1} \end{align} \]The ratio of the sample variances is therefore
\[ \frac{S_1^2}{S_2^2} \;\;\sim\;\; \frac{\sigma_1^2}{\sigma_2^2} \times \frac{\ChiSqrDistn(n_1 - 1\;\text{df})\;/\;(n_1 - 1)}{\ChiSqrDistn(n_2 - 1\;\text{df})\;/\;(n_2 - 1)} \]Pivot for ratio of variances
From the definition of the F distribution,
\[ \frac{S_1^2}{S_2^2} \;\;\sim\;\; \frac{\sigma_1^2}{\sigma_2^2} \times \FDistn(n_1 - 1,\;n_2 - 1\;\text{df}) \]This gives us a pivot for the ratio of the variances in the two groups.
Ratio of two sample variances
\[ \frac {S_1^2\;/\;S_2^2} {\sigma_1^2\;/\;\sigma_2^2} \;\;\sim\;\; \FDistn(n_1 - 1,\;n_2 - 1\;\text{df}) \]is a pivot for the ratio \(\diagfrac{\sigma_1^2}{\sigma_2^2}\).
Confidence interval
This can be used to find a 95% confidence interval for \(\diagfrac{\sigma_1^2}{\sigma_2^2}\).
\[ P\left(F_{0.025} \;\lt\; \frac {S_1^2\;/\;S_2^2} {\sigma_1^2\;/\;\sigma_2^2} \;\lt\; F_{0.975}\right) \;\;=\;\; 0.95 \]where \(F_{0.025}\) and \(F_{0.975}\) are the 2½th and 97½th percentiles of the \(\FDistn(n_1 - 1,\;n_2 - 1\;\text{df})\) distribution. A 95% confidence interval can be found by rearranging the inequality,
\[ \frac {s_1^2\;/\;s_2^2}{F_{0.975}} \;\;\lt\;\; \frac{\sigma_1^2}{\sigma_2^2} \;\;\lt\;\; \frac {s_1^2\;/\;s_2^2}{F_{0.025}} \]Interval estimates with different confidence levels can be found by replacing \(F_{0.025}\) and \(F_{0.975}\) with other quantiles of the \(\FDistn(n_1 - 1,\;n_2 - 1\;\text{df})\) distribution.
Example
Remote sensing from satellites is often used to determine land use. Near-infrared intensities were recorded by the Satellite Landsat Multispectral Scanner from 118 areas that were known to contain forest, and from another 40 areas that were known to be urban. These measurements are shown in the jittered dot plot on the right. |
![]() |
The sample mean and variance of the values from forested areas are \(\overline{x}_1= 92.93\) and \(s_1^2 = 48.06\), and the corresponding values for the urban areas are \(\overline{x}_2= 82.08\) and \(s_2^2 = 24.79\).
We will model the data as random samples from \(\NormalDistn(\mu_1, \sigma_1^2)\) and \(\NormalDistn(\mu_2, \sigma_2^2)\) distributions. The mean near-infrared intensities are clearly different in forested and urban areas. Assess whether it is reasonable to assume that the variances are the same for the two types of land use.
A 95% confidence interval for the ratio of the population variances is
\[ \frac {s_1^2\;/\;s_2^2}{F_{0.975}} \;\;\lt\;\; \frac{\sigma_1^2}{\sigma_2^2} \;\;\lt\;\; \frac {s_1^2\;/\;s_2^2}{F_{0.025}} \] \[ \frac {48.06 / 24.79}{1.737} \;\;\lt\;\; \frac{\sigma_1^2}{\sigma_2^2} \;\;\lt\;\; \frac {48.06 / 24.79}{0.616} \] \[ 1.116 \;\;\lt\;\; \frac{\sigma_1^2}{\sigma_2^2} \;\;\lt\;\; 3.15 \]Since the 95% confidence interval ratio of variances does not include the value 1.0, there is some evidence that the variance of the near infrared intensities is really higher in the forested areas.
(We will present a more direct test for equal variances in a later chapter.)