Difference between the sample means has a distribution

The best estimate of the difference between two population means, based on a single random sample from each population, is the difference between the sample means, . As with other parameter estimates based on sample data, this estimate is a random quantity that varies from sample to sample.

The accuracy of the estimate depends on its variability and this variability must be taken into account when assessing the value of from a single data set.

The diagram below shows random samples of 20 values from each of two populations. The population mean for Group B is 10 higher than that for Group A.

Click Take sample a few times to observe the variability of the difference between the sample means. The sample mean for Group B is usually between 5 and 15 higher than that for Group A.

Click the checkbox Accumulate then take several more samples. The jittered dot plot on the right shows the distribution of in the samples.

(Click on crosses in the jittered dot plot to display the random samples that gave rise to them.)

The differences, , have a normal distribution that is centred on the difference between the population means (10 in the example above). The following three pages will explain how the distribution's standard deviation is obtained.

Click the checkbox below to superimpose the correct normal distribution of on the jittered dot plot above.

Distribution of difference between sample means

On the previous page, we saw that the difference between the means of independent samples from two groups has a distribution that is approximately normal with mean and standard deviation given by the formulae

If standard deviations are known...

From this normal distribution, we can state that

Prob (  is within   ±  1.96     of   μ2 - μ1)   =   0.95

If we knew the values of the two parameters σ1 and σ2, we could therefore obtain a 95% confidence interval for µ2 − µ1 as

  ±   1.96  

Confidence interval for difference

Unfortunately, neither σ1 nor σ2 are known in most practical applications, so we must replace them by their sample equivalents in the confidence interval. As a result, the constant '1.96' must also be replaced by a slightly larger value from t-tables,

where the degrees of freedom for the t-value are

ν   =   min (n1−1,  n2−1)

Confidence intervals for the difference between two group means have the same properties as the confidence intervals that we investigated in earlier sections. A confidence interval that is obtained using the above formula varies from sample to sample and the confidence interval will include the true difference, µ2 − µ1, in approximately 95% of such repeat samples.

(Interval estimates obtained in this way actually have a confidence level that is slightly higher than 95% — they are conservative estimates. Some authors prefer a different formula for the degrees of freedom that gives a slightly lower t-value, but the difference is usually negligible.)

The exercises below give practice with the calculations for a 95% confidence interval for the difference between two group means.

Use the green templates on the right of the diagram to help with your calculations.

Repeat with different data sets until you feel confident about the method.

Demonstration of properties

The next diagram demonstrates the properties of 95% confidence intervals for a difference between group means.

Group B has a population mean that is 10 greater than the mean of group A. Click Accumulate then take 100 or more samples from the two populations.

Observe that approximately 95% of the resulting confidence intervals for µ2 − µ1 include the true value (10).