Data sets with several groups
Problem | Data collected | Randomisation |
---|---|---|
A manufacturer of breakfast cereals want to introduce a new muesli and wonders which of 3 recipes to use. | 150 people in a supermarket are each asked to taste one of the recipes and give it a score between 1 and 10. | Customers must be randomly given one of the 3 recipes. |
An investor wants to know which of four types of mutual fund is likely to give the highest return. | Ten funds of each type are selected and their returns over the previous year is determined. | The selected funds should be randomly selected from a list of funds of each type. |
An lecturer wants to know whether there are differences between the effectiveness of the tutors in a course. | Final exam marks from all students are grouped by the six different tutors. | It must be assumed that the students were randomly allocated to tutors. |
Data of this form can be considered as either:
We will model the data in terms of g groups. The data often arise from completely randomised experiments with g treatments.
Model for several groups
In an earlier section, we used the following model when comparing the means of two groups.
Group 1: | Y ~ normal (µ1 , σ1) |
Group 2: | Y ~ normal (µ2 , σ2) |
We also presented methods for inference about the difference between the two group means.
The most obvious extension of this model to g > 2 groups would allow different means and standard deviations in all groups.
Group i: | Y ~ normal (µi , σi) |
Same standard deviation in all groups
Extending the test for equal group means from 2 to g > 2 groups requires an extra assumption in the model. We must assume that the standard deviations in all groups are the same.
Group i: | Y ~ normal (µi , σ) |
If there are g groups, the model has g + 1 unknown parameters — the g group means and the common standard deviation, σ. This model is flexible enough to be useful for many data sets.
If the assumptions of a normal distribution and constant variance do not hold, a nonlinear transformation of the response may result in data for which the model is appropriate.
Illustration of the model
The diagram below shows a normal model for g = 3 groups. Initially, the diagram allows the flexibility of separately adjusting the 3 means and 3 standard deviations using the sliders.
Click the checkbox Equal st devn to restrict the model by constraining the 3 standard deviations to be the same. This reduces the number of parameters to 4 — the 3 group means and the common standard deviation. Use the sliders to see the flexibility of this model.
Rotate the display to look down on the two main axes (click the y-x button). The normal distributions in the three groups are represented by pale bands stretching two standard deviations on each side of the group mean, with a slightly darker band at 0.674 standard deviations on each side of the mean. Click Take Sample a few times to observe typical data sets that would be obtained from this model.
Observe that approximately 95% of the values are within the pale blue bands — about 95% of values from any normal distribution are within 2 standard deviations of the mean. About 50% of the values are within the darker bands.