Differences
The key to analysing paired data is to recognise that the differences between X and Y hold all the information about whether their means are the same. Writing
D = Y - X
the hypotheses
H0 : μX = μY
HA : μX ≠ μY
can be expressed as
H0 : μD = 0
HA : μD ≠ 0
This reduces the paired data set to a univariate data set of differences. The test also becomes a simpler hypothesis test about the mean of these differences.
Music and work efficiency
The increase in efficiency for each employee (after the music system was installed) is shown in the final column below.
Efficiency rating | |||||||
---|---|---|---|---|---|---|---|
Employee | Before | After | Difference | ||||
|
|
|
|
Is the mean of the differences zero?
Twin studies
The final column below shows the difference in IQ for each pair (good minus poor)
IQ | |||||||
---|---|---|---|---|---|---|---|
Family | Poor environment | Good environment | Difference | ||||
|
|
|
|
Is the mean of the differences zero?
Garage repair estimates
The final column shows the amount that garage A overcharges, compared to garage B.
Estimate for car repair | |||||||
---|---|---|---|---|---|---|---|
Car | Garage A | Garage B | Difference | ||||
|
|
|
|
Is the mean of the differences zero?
Analysis of paired data
By taking differences, much of the variability between the individuals is eliminated. This provides considerably more information to help assess the null and alternative hypotheses.
The benefits of pairing will be explained more fully in a later page.
Garage repair estimates
The diagram below shows the repair estimates from garages A and B. The two distributions overlap considerably due to variability in the amounts of damage to the cars, so it initially appears that there will be little evidence against equal means.
Click on individual crosses to show the difference between the estimates for individual cars. Most estimates are higher for garage A.
Click Show Pairing to draw lines between the pairs of crosses and display the differences in a jittered dot plot. The differences give much clearer evidence that the mean estimate is higher for garage A — it seems that the mean difference is positive.
Note that it would be wrong to analyse this as two separate samples:
The data are paired because each pair of repair estimates is for the same car.