Difference between the sample means has a distribution
The best estimate of the difference between two population means, based on a single
random sample from each population, is the difference between the sample means,
.
As with other parameter estimates based on sample data, this estimate is a random quantity
that varies from sample to sample.
The accuracy of the estimate depends on its variability and this variability must be
taken into account when assessing the value of
from a single data set.
The diagram below shows random samples of 20 values from each of two populations. The population mean for Group B is 10 higher than that for Group A.
Click Take sample a few times to observe the variability of the difference between the sample means. The sample mean for Group B is usually between 5 and 15 higher than that for Group A.
Click the checkbox Accumulate then take several more samples. The
jittered dot plot on the right shows the distribution of
in the samples.
(Click on crosses in the jittered dot plot to display the random samples that gave rise to them.)
The differences,
,
have a normal distribution that is centred on the difference between
the population means (10 in the example above). The following three pages will explain how
the distribution's standard deviation is obtained.
Click the checkbox below to superimpose the correct normal distribution of
on the jittered dot plot above.
Distribution of difference between sample means
On the previous page, we saw that the difference between the means of independent samples from two groups has a distribution that is approximately normal with mean and standard deviation given by the formulae
If standard deviations are known...
From this normal distribution, we can state that
Prob ( is within ± 1.96
of μ2 - μ1) = 0.95
If we knew the values of the two parameters σ1 and σ2, we could therefore obtain a 95% confidence interval for µ2 − µ1 as
± 1.96
Confidence interval for difference
Unfortunately, neither σ1 nor σ2 are known in most practical applications, so we must replace them by their sample equivalents in the confidence interval. As a result, the constant '1.96' must also be replaced by a slightly larger value from t-tables,
where the degrees of freedom for the t-value are
ν = min (n1−1, n2−1)
Confidence intervals for the difference between two group means have the same properties as the confidence intervals that we investigated in earlier sections. A confidence interval that is obtained using the above formula varies from sample to sample and the confidence interval will include the true difference, µ2 − µ1, in approximately 95% of such repeat samples.
(Interval estimates obtained in this way actually have a confidence level that is slightly higher than 95% — they are conservative estimates. Some authors prefer a different formula for the degrees of freedom that gives a slightly lower t-value, but the difference is usually negligible.)
The exercises below give practice with the calculations for a 95% confidence interval for the difference between two group means.
Use the green templates on the right of the diagram to help with your calculations.
- For each group separately, estimate the standard deviation of the group mean.
- Combine the two standard deviations to find the standard deviation of the difference between the means.
- Type the degrees of freedom to obtain the appropriate t-value
- Use these values to complete the confidence interval, then click Check
Repeat with different data sets until you feel confident about the method.
Demonstration of properties
The next diagram demonstrates the properties of 95% confidence intervals for a difference between group means.
Group B has a population mean that is 10 greater than the mean of group A. Click Accumulate then take 100 or more samples from the two populations.
Observe that approximately 95% of the resulting confidence intervals for µ2 − µ1 include the true value (10).