One application of the F distribution arises when comparing two random samples from normal distributions. If \(S_1^2\) and \(S_2^2\) are the the sample variances of a random samples of \(n_1\) and \(n_1\) values from two normal distributions,

\[ \begin{align} \frac {n_1-1}{\sigma_1^2} S_1^2 \;&\sim\; \ChiSqrDistn(n_1 - 1\;\text{df}) \\[0.3em] \frac {n_2-1}{\sigma_2^2} S_2^2 \;&\sim\; \ChiSqrDistn(n_2 - 1\;\text{df}) \end{align} \]

This can also be expressed as

\[ \begin{align} \frac {S_1^2}{\sigma_1^2} \;&\sim\; \frac{\ChiSqrDistn(n_1 - 1\;\text{df})}{n_1 - 1}\\[0.5em] \frac{S_2^2}{\sigma_2^2} \;&\sim\; \frac{\ChiSqrDistn(n_2 - 1\;\text{df})}{n_2 - 1} \end{align} \]

Since the ratio of the two distributions on the right gives an F distribution, this gives us a pivot for the ratio of the two group variances.

Ratio of two sample variances

\[ \frac {S_1^2\;/\;S_2^2} {\sigma_1^2\;/\;\sigma_2^2} \;\;\sim\;\; \FDistn(n_1 - 1,\;n_2 - 1\;\text{df}) \]

is a pivot for the ratio \(\diagfrac{\sigma_1^2}{\sigma_2^2}\).

Since

\[ P\left(F_{0.025} \;\lt\; \frac {S_1^2\;/\;S_2^2} {\sigma_1^2\;/\;\sigma_2^2} \;\lt\; F_{0.975}\right) \;\;=\;\; 0.95 \]

where \(F_{0.025}\) and \(F_{0.975}\) are the 2½th and 97½th percentiles of the \(\FDistn(n_1 - 1,\;n_2 - 1\;\text{df})\) distribution, a 95% confidence interval can be found by rearranging the inequality,

\[ \frac {s_1^2\;/\;s_2^2}{F_{0.975}} \;\;\lt\;\; \frac{\sigma_1^2}{\sigma_2^2} \;\;\lt\;\; \frac {s_1^2\;/\;s_2^2}{F_{0.025}} \]

Interval estimates with different confidence levels can be found by replacing \(F_{0.025}\) and \(F_{0.975}\) with other quantiles of the \(\FDistn(n_1 - 1,\;n_2 - 1\;\text{df})\) distribution.

Question

Remote sensing from satellites is often used to determine land use.

Near-infrared intensities were recorded by the Satellite Landsat Multispectral Scanner from 118 areas that were known to contain forest, and from another 40 areas that were known to be urban. These measurements are shown in the jittered dot plot on the right.

The sample mean and variance of the values from forested areas are \(\overline{x}_1= 92.93\) and \(s_1^2 = 48.06\), and the corresponding values for the urban areas are \(\overline{x}_2= 82.08\) and \(s_2^2 = 24.79\).

We will model the data as random samples from \(\NormalDistn(\mu_1, \sigma_1^2)\) and \(\NormalDistn(\mu_2, \sigma_2^2)\) distributions. The mean near-infrared intensities are clearly different in forested and urban areas. Assess whether it is reasonable to assume that the variances are the same for the two types of land use.

(Solved in full version)