Model for single factor and covariate

The usual model for a completely randomised design with a single factor is:



yij  =  µ


 + 
explained
by factor

βi


 + 


εij

where the term εij has a normal distribution that describes the unexplained variation in the response. If a numerical covariate is recorded, a linear term can be added to the model.



yij  =  µ


 + 
explained
by factor

βi


 + 
explained
by covariate

γ xij


 + 


εij

where xij is the value of the covariate.

Parallel regression lines

Since the base level of the term for a factor is always defined to be zero, β1 = 0, this model implies a standard linear regression model between the response and covariate for the base factor level,

y 1j   =   µ  +  γ x1j  +  ε1j

At any other factor level, i, the regression line is βi higher.

Increase in accuracy

If the covariate is really related to variability in the experimental units, the unexplained variation will be reduced.

The lower unexplained variation (and hence lower residual sum of squares) means that the effect of the factor of interest can be more accurately estimated when the model is fitted by least squares.

Difference between two barley varieties

The diagram below is the same as that on the previous page, but it allows the option of fitting a model with the covariate moisture to the data and estimating the difference between the varieties from this model. (It is the least squares estimate of β2 in the model above.)

Select Artificial data from the pop-up menu. Use the slider to see the effect of unbalanced allocation of the soil moisture if the effect of moisture is ignored. Click the checkbox LS with covariate to estimate the effect of the varieties from a model that also explains variability caused by moisture.

Using the covariate moisture in the model removes all unexplained variation, so the difference between the varieties is always estimated to be 2.1 however unbalanced the experiment.

Click Extra variation to add more unexplained random variation to the data. This introduces some variability in the estimated difference between the varieties, but it is smaller than would occur in the model without the covariate.