Missing values
In most experiments in which a single factor is varied, we aim to have the same number of replicates for each factor level. However sometimes accidents prevent the response from being measured from a few experimental units, leading accidentally to unequal replicates.
Provided the reason that the values are missing is unrelated to the treatment or response, a standard analysis of variance can still be used to compare the treatments.
If the probability of a value being missing depends on the treatment or (unrecorded) response, the standard analysis should not be used.
Effect of missing values on the analysis
The diagram below shows the yield of wheat from a completely randomised experiment with three varieties of wheat and 8 replicates.
Click a few values in the three columns on the left to change them into missing values.
Observe that the sums of squares and the residual degrees of freedom change in the analysis of variance table. However the p-value still describes the strength of evidence for whether there are differences between the mean yields of the three varieties.
Make all but two observations for Variety A missing. Observe that the 95% confidence interval for this variety's mean yield (represented by the blue bands on the scatterplot) is now much wider than these for the other varieties. The fewer the response measurements for any factor level, the less accurately the mean response is estimated.
Nitrogen in red clover plants
An experiment is conducted to assess how the nitrogen content of red clover plants is affected by inoculation by combination cultures of Rhizobium trifoli strains and Rhizobium melitoti. The experiment initially used 30 plants, randomly assigning 5 of the plants to each of the 6 treatments. At the end of the growing season the nitrogen content of the plants was measured but, for reasons unrelated to the inoculants or nitrogen content of the plants, two response measurements could not be obtained. (Perhaps the plants died for an unrelated reason or the laboratory equipment broke down when the nitrogen content was being measured.)
Inoculant | |||||
---|---|---|---|---|---|
3DOk1 | 3DOk5 | 3DOk4 | 3DOk7 | 3DOk13 | Composite |
19.4 32.6 27.0 32.1 33.0 |
17.7 24.8 27.9 25.2 |
17.0 19.4 9.1 11.9 |
20.7 21.0 20.5 18.8 18.6 |
14.3 14.4 11.8 11.6 14.2 |
17.3 19.4 19.1 16.9 20.8 |
The analysis of variance table below has the same form and interpretation as would have been obtained for the complete experiment.
Source of variation |
Sum of sqrs |
d.f. | Mean ssq | F-ratio | p-value |
---|---|---|---|---|---|
Inoculant | 812.68 | 5 | 812.68 | 12.72 | 0.0001 |
Residual | 281.12 | 22 | 12.78 | ||
Total | 1093.80 | 27 |
From the p-value, we conclude that it is almost certain that the six inoculants do not result in the same nitrogen content in red clover plants.