We now consider data sets that arise as random samples from two or more groups.
A normal model
We often assume that all groups have normal distributions, with a \(\NormalDistn(\mu_i,\;\sigma_i^2)\) distribution in the \(i\)'th of the \(g\) groups. It is also common to assume that the variances is the same in all groups, so
\[ Y_{ij} \;\;\sim\;\; \NormalDistn(\mu_i, \sigma^2) \qquad \text{for }i = 1,\dots,g \text{ and }j = 1,\dots,n_i \]where \(Y_{ij}\) is the \(j\)'th value in group \(i\). Note that we are using \(n_i\) to denote the number of values in the \(i\)'th group and the total number of values is \(n = \sum_{i=1}^g n_i\).
The maximum likelihood estimates of the group means \(\{\mu_i\}\) are
\[ \hat{\mu}_{i} \;\;=\;\; \overline{Y}_i \]but how should we estimate the common group variance, \(\sigma^2\)?
Definition
The pooled estimate of the common group variance is
\[ S_{\text{pooled}}^2 \;=\; \frac{\sum_{i=1}^g (n_i - 1)S_i^2}{\sum_{i=1}^g (n_i - 1)} \]where
\[ S_i^2 \;=\; \frac{\sum_{j=1}^{n_i} (Y_{ij} - \overline{Y}_i)^2} {n_i - 1} \]is the sample variance in group \(i\).
We now give its distribution.
Distribution of pooled variance
The pooled estimator \(S_{\text{pooled}}^2\) is an unbiased estimator of \(\sigma^2\) and
\[ \frac{n-g}{\sigma^2}S_{\text{pooled}}^2 \;\;\sim\;\; \ChiSqrDistn(n-g\;\text{df}) \](Proved in full version)
Since the quantity on the left is a pivot for \(\sigma^2\), it can be used to find a confidence interval for the parameter, in a very similar way to how one was found from a single random sample.