Treatments with zero or one replicate

In completely randomised experiments with a single replicate for all treatments, we showed earlier there are no residual degrees of freedom in the model with the highest-order interaction between all factors. If some of these treatments have zero replicates, there will be some parameters in this model that cannot be estimated (as explained on the previous page), but there are still zero residual degrees of freedom in the full model.

In order to conduct hypothesis tests, we require residual degrees of freedom, so it must be assumed that there are no high-order interactions between the factors. The highest-order interaction sums of squares and degrees of freedom are treated as unexplained variation and are used as the residual sum of squares and degrees of freedom.

Problem with sequential sums of squares in anova table

If treatments in an experiment are missing randomly (rather than by design), the sums of squares explained by the factors and their interactions depend on the order of adding them to the model. The existance of several alternative analysis of variance tables (corresponding to different orderings of adding the main effects and interactions) complicates the analysis.

Effect of fertilisers on maize yield

The data in the table below arose from a factorial experiment in which the quantities of nitrogen, potassium and phosphorus were varied.

  N = 10     N = 20     N = 30
   K=10   K=20   K=30       K=10   K=20   K=30       K=10   K=20   K=30 
P = 10  65 80 104   P = 10  86 107 129   P = 10  108 129 141
P = 20  10   126   P = 20  107       P = 20  125 143  
P = 30  107 126 148   P = 30  125 144 168   P = 30  149 163 184

Note that there are four missing values from the complete factorial design, leaving the design unbalanced. The sum of squares table below only shows the main effects of the three factors — we will consider their interactions later.

Drag the red arrows to reorder of adding the terms in the table and observe that the sums of squares change. (The dark bars represent the sequential sums of squares; the thin lighter bars show the sums of squares for the terms if they were added last to the model.) For example, the table below shows the sums of squares explained by Nitrogen in the different orderings:

Order Sum of squares
Added first 5588.2
Added after Phosphorous 5594.9
Added after Potassium 6298.3
Added after both others 6297.7

In this example, the differences are relatively slight and the conclusions are the same — in a full analysis of variance table, all p-values are reported to be 0.0000. However the differences can be much greater depending on the pattern of missing values.

Marginal sums of squares

When trying to find the model for any data set that fits best, a good approach is to start with a reasonably complete model with many terms, then investigate non-significant terms that can be dropped.

At each stage, the relevant sums of squares (and associated p-value) that are compared are the explained sums of squares for the terms under consideration when they are last added to the model. These are reductions to the residual sum of squares when the terms are dropped from the full model and are called marginal or Type 3 sums of squares. The procedure will be illustrated in the example below.

Note that:

You should never consider dropping any term from the model if a higher-order interaction involving it is still in the model.


Effect of fertilisers on maize yield

The table below shows the marginal (Type 3) sums of squares for the main effects and interactions in the data shown at the top of this page. Since there are treatments with more than one replicate, we must assume that there is no 3-factor interaction in order to test the significance of the other terms. Although the 3-factor interaction would have 8 degrees of freedom if all treatments were present, the missing values result in 4 degrees of freedom which now become the residual degrees of freedom.

In the model with all 2-factor interactions, we can consider dropping any of the 2-factor interactions but not the main effects — they are greyed out.

The phosphorus-potassium interaction is least significant so it should be dropped from the model first — click the checkbox to its left on the table to remove it. Since the data are unbalanced, all other sums of squares and p-values change.

Now remove the nitrogen-phosphorus interaction term — its p-value is 0.0911. Again the marginal sums of squares and their p-values change. Since there are no longer any interaction terms in the model involving phosphorus, we can now also consider dropping its main effect and it is no longer greyed out in the table.

After dropping the final interaction term, all main effects are highly significant, so we would conclude that:

There is no evidence of interaction between the effects of the three fertilisers but all three fertilisers do affect the yield.

Analysis of the data should proceed with an examination of the parameter estimates. Because of the lack of balance in the data, tables of factor means do not adequately describe the factor effects. For example, the experimental units getting phosphorus = 10 tend to have higher potassium than those getting phosphorus = 20, so the table of means for the phosphorus levels cannot indicate whiether it is phosphorus or potassium that is causing the difference.

The table below shows the parameter estimates (and 95% confidence intervals) for all parameters in the model with only main effects for all factors. All three fertilisers seem to have almost the same effect on yield and their effects all increase fairly linearly.

Note that the estimates for each factor are differences between the mean yield at the factor levels and the baseline level.