Choice between paired data or two independent samples

It is sometimes possible to answer questions about the difference between two means by collecting two alternative types of data.

Two independent samples
Measurements are made from two samples of individuals from the groups whose means are to be compared. A 2-sample t-test will compare the means.
One paired sample
The 'individuals' can be re-defined as pairs of related values from the two groups, and a single sample of these pairs can be collected. A paired t-test can be performed on the differences to compare the means.

Which experimental design is better?

If the individuals in the 2 groups can be paired so that the pairs are relatively similar, a paired design gives more accurate results.


Car repair costs from two garages

Consider an insurance company that is investigating whether Garage B is over-charging for car repairs. Data should be collected to compare the average estimates for repairs from Garage B and another garage, Garage A.

Simulation

We will conduct a simulation based on a pool of 20 cars. In the simulation, all repair estimates are normally distributed with standard deviation σ = $120, but with means shown in the table below

  Mean repair estimate, µ ($)
Car   Garage A   Garage B
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
380
760
1180
560
260
1000
1300
1720
1800
1440
630
580
1660
1160
460
1200
1780
320
900
1040
580
960
1380
760
460
1200
1500
1920
2000
1640
830
780
1860
1360
660
1400
1980
520
1100
1240

Note that Garage B over-charges by $200 on average for each car.

Two independent samples

We first simulate an experiment in which 10 cars are randomly selected to be sent to Garage A, and the other 10 cars are assessed by Garage B.

A 95% confidence interval for the over-charging (difference between the mean estimates from the two garages) is shown, and the p-value for a 2-tailed test for a difference is also given.

Repeat the simulation several times and observe from the p-values that:

The 2-sample test rarely gives evidence that Garage B over-charges — the p-value is usually over 0.05.

Click Show paired values to see the (unobserved) data that would have been obtained if all cars had been assessed by all garages.

Paired data

We next simulate an experiment in which 10 cars are randomly selected and are assessed by both Garage A and Garage B.

A 95% confidence interval of the over-charging is again shown, this time based on the differences between the estimates in the pairs. The p-value for a 2-tailed paired t-test for a difference is also given.

Repeat the simulation several times and observe from the p-values that:

The paired t-test usually finds strong evidence that Garage B over-charges.


Matched pairs in experiments

It is often impossible to repeat the same experiment twice with the same experimental units. In the Car Repair Costs example, if the comparison was to be made of actual repair costs rather than estimates, it would be impossible to obtain measurements for the same car from both garages.

However it is often possible to group together the experimental units into pairs that are similar in some way. These are called matched pairs. The two experimental units in each pair are randomly assigned to the two treatments.

Actual car repair costs from garages A and B
The damaged cars could be grouped into pairs with similar types of damage. The two cars in each pair would be randomly sent to the two garages for repair.
Do students perform better in a test before or after lunch?
Each student should only sit the test once. However the students could be grouped into pairs, based on their IQ or their results in earlier tests. In each pair, a randomly selected student would sit the test before lunch and the other would sit after lunch.
Effect of fertiliser on grass growth
Six fields in different parts of an agricultural research station are available for an experiment to estimate the increase in grass growth from applying a standard quantity of fertiliser. Each field can be split in half to form 'pairs' of half-fields for the experiment. Fertiliser would be applied to a randomly selected half of each field.

In each example, pairing gives more accurate estimates than randomly allocating the units (cars, students or fields) to the two treatments if the units in the pairs are more similar to each other than to units in other pairs.