Comparing several groups

The methods for obtaining confidence intervals and hypothesis tests for two groups do not extend to comparisons of the means of three or more groups.

A new approach is needed to compare the means of 3 or more groups.

Hypotheses for testing

For the remainder of this section, we assume a normal model with equal standard deviations.

Group i:   Y   ~   normal, σ)

If all means are the same in the model, then there are no differences between the groups. We are therefore interested in testing the hypotheses,

H0 :   µi  =  µj        for all i and j
HA:   µi  ≠  µj        for at least some i, j

Variation between and within groups

If the model means are all equal, it would be expected that the sample means would be similar. However they are unlikely to be identical. We therefore need to assess whether the variation between the group means is unusually great. To do this, we must also take account of the variation within the groups.

We will show in later pages that these two aspects of variation can be described with summary statistics and used for a hypothesis test.

Variation between groups

The jittered dot plots below show 10 numerical measurements from each of 4 groups.

Use the slider to alter the difference between the group means. Observe that:

Variation within groups

The diagram below is similar, but the slider adjusts the spread of values within each group, leaving the group mean unaltered.

Observe that ...

Are the underlying means equal?

The evidence for a difference between the group means depends on both the variation between and within groups. It is strongest when:

  • the between-group variation is relatively high, and
  • the within-group variation is relatively low.

Signal and noise

In the field of communications, the signal in a recorded or transmitted message (e.g. music) is defined to be the information in which we are interested. There is often other variability in the received message that contains no useful information; this variability can potentially obscure or corrupt the signal and is called noise.

Applying this terminology to the comparison of several groups,

Signal
The variation between group means is the signal in the data.
Noise
Variation within groups is noise.

The greater the noise, the harder it is to detect or estimate the signal. We will next present numerical summaries of the signal and noise in multi-group data.