We now consider random samples from two normal population,
\[ \begin{align} X_{1,i} \;\;&\sim\;\; \NormalDistn(\mu_1,\;\sigma^2) \qquad \text{for } i=1,\dots,n_1 \\ X_{2,i} \;\;&\sim\;\; \NormalDistn(\mu_2,\;\sigma^2) \qquad \text{for } i=1,\dots,n_2 \end{align}\]The difference between the sample means is normally distributed,
\[ \overline{X}_1 - \overline{X}_2 \;\;\sim\;\; \NormalDistn\left(\mu_1 - \mu_2,\;\sigma^2\left(\frac 1{n_1} + \frac 1{n_2}\right)\right) \]The best estimate of the common variance, \(\sigma^2\), is
\[ S_{\text{pooled}}^2 \;\;=\;\; \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_i + n_2 - 2} \]and we showed earlier that its distribution is proportional to a chi-squared distribution,
\[ \frac{n_1 + n_2 - 2}{\sigma^2}S_{\text{pooled}}^2 \;\;\sim\;\; \ChiSqrDistn(n_1 + n_2 - 2 \text{ df}) \]Our best estimate of the standard error of \( \overline{X}_1 - \overline{X}_2\) is therefore
\[ \se(\overline{X}_1 - \overline{X}_2) \;=\; \sqrt{\frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2} \left(\frac 1{n_1} + \frac 1{n_2}\right) } \]Testing for equal means
We now consider a hypothesis test for whether the two means are equal,
H0 : \(\mu_1 = \mu_2\)
HA : \(\mu_1 \ne \mu_2\)
(or the corresponding one-tailed alternative). The following function of the data can be used as a test statistic — its distribution is fully known when the null hypothesis holds.
Test statistic
If \(\overline{X}_1\) and \(S_1^2\) are the mean and variance of a sample of \(n_1\) values from a \(\NormalDistn(\mu_1, \sigma^2)\) distribution and \(\overline{X}_2\) and \(S_2^2\) are the mean and variance of an independent sample of \(n_2\) values from a \(\NormalDistn(\mu_2, \sigma^2)\) distribution,
\[ T \;\;=\;\; \frac{\overline{X}_1 - \overline{X}_2}{\se(\overline{X}_1 - \overline{X}_2)} \;\;\sim\;\; \TDistn(n_1 + n_2 - 2 \text{ df}) \]provided \(\mu_1 = \mu_2\).
(Proved in full version)
A p-value for the test is the probability of a value from this t distribution that is further from zero than the value that is evaluated from the actual data.
Example
A botanist is interested in comparing the growth response of dwarf pea stems to two different levels of the hormone indoleacetic acid (IAA). Using 16 day old pea plants the botanist obtains 5 millimetre sections and floats these sections on solutions with different hormone concentrations to observe the effect of the hormone on the growth of the pea stem. Let \(X\) and \(Y\) denote respectively the independent growths that can be attributed to the hormone during the first 26 hours after sectioning for \((0.5 \times 10^{-4})\) and \(10^{-4}\) levels of concentration of IAA.
The botanist measured the growths of pea stem segments in millimetres for \(n_X = 11\) observations of \(X\):
0.8 | 1.8 | 1.0 | 0.1 | 0.9 | 1.7 | 1.0 | 1.4 | 0.9 | 1.2 | 0.5 |
and \(n_Y = 13\) observations of \(Y\):
1.0 1.8 |
0.8 2.5 |
1.6 1.4 |
2.6 1.9 |
1.3 2.0 |
1.1 1.2 |
2.4 |
Test whether the larger hormone concentration results in greater growth of the pea plants.
(Solved in full version)