Data sets with several groups

Problem Data collected Randomisation
An apple grower wants to assess how different types of pesticide affect the yield of apples. Each of 4 types pesticides is used on 10 different apple trees. The number of apples produced by each of the 40 trees is recorded. The 10 trees given each pesticide were randomly chosen from the 40 trees used in the study.
A lecturer wants to know whether there are differences between the effectiveness of the tutors in a course. Final exam marks from all students are grouped by the six different tutors. It must be assumed that the students were randomly allocated to tutors.
A manufacturer of breakfast cereals want to introduce a new muesli and wonders which of 3 recipes to use. 150 people in a supermarket are each asked to taste one of the recipes and give it a score between 1 and 10. Customers must be randomly given one of the 3 recipes.

Data of this form can be considered as either:

We will model the data in terms of g groups. The data often arise from completely randomised experiments with g treatments.

Model for several groups

In an earlier section, we used the following model when comparing the means of two groups.

Group 1:   Y   ~   normal, σ1)
Group 2:   Y   ~   normal, σ2)

We also presented methods for inference about the difference between the two group means.

The most obvious extension of this model to g > 2 groups would allow different means and standard deviations in all groups.

Group i:   Y   ~   normal, σi)

Same standard deviation in all groups

Extending the test for equal group means from 2 to g > 2 groups requires an extra assumption in the model. We must assume that the standard deviations in all groups are the same.

Group i:   Y   ~   normal, σ)

If there are g groups, the model has g + 1 unknown parameters — the g group means and the common standard deviation, σ. This model is flexible enough to be useful for many data sets.

If the assumptions of a normal distribution and constant variance do not hold, a nonlinear transformation of the response may result in data for which the model is appropriate.


Illustration of the model

The diagram below shows a normal model for g = 3 groups. Initially, the diagram allows the flexibility of separately adjusting the 3 means and 3 standard deviations using the sliders.

Click the checkbox Equal st devn to restrict the model by constraining the 3 standard deviations to be the same. This reduces the number of parameters to 4 — the 3 group means and the common standard deviation. Use the sliders to see the flexibility of this model.

Rotate the display to look down on the two main axes (click the y-x button). The normal distributions in the three groups are represented by pale bands stretching two standard deviations on each side of the group mean, with a slightly darker band at 0.674 standard deviations on each side of the mean. Click Take Sample a few times to observe typical data sets that would be obtained from this model.

Observe that approximately 95% of the values are within the pale blue bands — about 95% of values from any normal distribution are within 2 standard deviations of the mean. About 50% of the values are within the darker bands.