Comparing the populations
For two-group data sets, we usually want to compare the underlying populations. In particular, the main questions of interest are:
Comparing the population means
The two standard deviations in the groups may differ. However we are usually more interested in differences between the population means. The earlier questions can be asked in terms of the difference between these means,
δ = μ2 − μ1
If the group means are equal (and µ2 - µ1 is therefore zero), then values from neither group are higher than from the other, on average. Indeed, if the distributions are normal and σ1 and σ2 are also equal, then a zero value for µ2 - µ1 also implies that the distributions in the two groups are identical.
µ2 - µ1 describes how much higher the values in group 2 are (on average) than the values in group 1.
The best estimate of µ2 - µ1 is, naturally, the difference between the means of the two samples, .
Randomness of sample difference
Unfortunately,
cannot give a definitive answer to questions about µ2 - µ1 since it is a random summary statistic — it varies from sample to sample. The
distribution of
must be understood before we can make any inference about µ2 - µ1.
Simulation: Manipulative skills of job applicants
To test the manipulative skill of job applicants, they are sometimes given a 'one-hole test' in which they grasp a pin, move it to a hole, insert it, and return for another pin. The test score is the number of pins inserted in a fixed time interval. A large study was undertaken comparing male college students with experienced female industrial workers. The table below describes the number of pins inserted in one minute.
Group | n | mean | st devn |
---|---|---|---|
Male college students | 750 | 35.12 | 4.31 |
Experienced female industrial workers | 412 | 37.32 | 3.83 |
We will conduct a simulated experiment based on this scenario. In the simulation, we will generate 'numbers of pins' for 40 male students from a normal distribution with µ2 = 35.12 pins and σ2 = 4.31 pins and similar data for another 40 experienced female workers from a normal distribution with µ1 = 37.32 and σ1 = 3.83.
Note that the female industrial workers, on average, insert µ2 - µ1 = 2.20 more pins than the male students.
(The normal distributions from which the data are sampled are represented by a pale blue band at µ ± 2σ. The narrower darker blue band includes half of the population distribution.)
Click Accumulate, then take several samples. Observe that the difference between the sample means is a random quantity whose distribution is centred on µ2 - µ1 = 2.20 pins .
The difference in means from a single data set, ,
is therefore an estimate of µ2 - µ1,
but is unlikely to be exactly equal to it.
Welders who are paid a salary and those on piecework
In practice, the underlying population means (and their difference) are unknown, and only a single sample from each group is available. The data set below is a typical example.
Without an understanding of the distribution of ,
it is impossible to properly interpret what the sample difference, 9.5 pieces,
tells you about the difference between the underlying population means.