Comparing several groups
The methods for obtaining confidence intervals and hypothesis tests for two groups do not extend to comparisons of the means of three or more groups.
A new approach is needed to compare the means of 3 or more groups.
Hypotheses for testing
For the remainder of this section, we assume a normal model with equal standard deviations.
Group i: | Y ~ normal (µi , σ) |
If all means are the same in the model, then there are no differences between the groups. We are therefore interested in testing the hypotheses,
H0 : µi = µj for
all i and j
HA: µi ≠ µj for
at least some i, j
Variation between and within groups
If the model means are all equal, it would be expected that the sample means would be similar. However they are unlikely to be identical. We therefore need to assess whether the variation between the group means is unusually great. To do this, we must also take account of the variation within the groups.
We will show in later pages that these two aspects of variation can be described with summary statistics and used for a hypothesis test.
Variation between groups
The jittered dot plots below show 10 numerical measurements from each of 4 groups.
Use the slider to alter the difference between the group means. Observe that:
Variation within groups
The diagram below is similar, but the slider adjusts the spread of values within each group, leaving the group mean unaltered.
Observe that ...
Are the underlying means equal?
The evidence for a difference between the group means depends on both the variation between and within groups. It is strongest when:
Signal and noise
In the field of communications, the signal in a recorded or transmitted message (e.g. music) is defined to be the information in which we are interested. There is often other variability in the received message that contains no useful information; this variability can potentially obscure or corrupt the signal and is called noise.
Applying this terminology to the comparison of several groups,
The greater the noise, the harder it is to detect or estimate the signal. We will next present numerical summaries of the signal and noise in multi-group data.