1. Home
2. COVARIATE directive

# COVARIATE directive

Specifies covariates for use in subsequent `ANOVA` statements.

### Parameter

    variates or pointers Covariates

### Description

To perform analysis of covariance you need to define the treatment model (using `TREATMENTSTRUCTURE`) and the underlying structure of the design (using `BLOCKSTRUCTURE`) as in ordinary analysis of variance, and then simply specify the required covariates using the `COVARIATE` directive. You can then do the analysis by `ANOVA`, get further output by `ADISPLAY` and so on, in the usual way.

In the simplest form of the `COVARIATE` directive, its (unnamed) parameter just contains a list of the variates that are to be used as covariates. Alternatively, you can group some of the variates into pointers. The analysis-of-variance table will then contain a line for each group instead of the individual covariates in that group (see below).

You can use covariates to incorporate any quantitative information about the units into the model. In field experiments there may often be linear trends in fertility. These can be estimated and removed by fitting a covariate of the position of the plot along the direction of the trend. For example

`COVARIATE Location`

For a quadratic trend, you would also include a covariate containing the squares of the positions.

`CALCULATE Quadtrend = Location**2`

`COVARIATE Location,Quadtrend`

In experiments on animals, you may wish to use measurements such as the original weight. However the assumption is always that the y-variate is linearly related to the covariates.

Covariates are incorporated into the model as terms for a linear regression. Genstat fits the covariates, together with the treatments, in each stratum. This should explain some of the variability of the units in the stratum, and so decrease the stratum residual mean square.

Each treatment combination will have been applied to units whose mean value for each covariate differs from that of other treatment combinations; so even in the absence of any treatment effects, the y-values recorded for the different combinations would not be identical. A further effect of the analysis is to adjust the treatment estimates for the covariates, to correct for this. This adjustment causes some loss of efficiency in the treatment estimation. The remaining efficiency is measured by the covariance efficiency factor, shown for each treatment term in the “cov. ef.” column of the analysis-of-variance table. The values are in the range zero to one. A value of zero indicates that the treatment contrasts are completely correlated with the covariates: after the covariates have been fitted there is no information left about the treatments. A value of one indicates that the covariates and the treatment term are orthogonal. Usually the values will be around 0.8 to 0.9. A low value should be taken as a warning: either the measurements used as covariates have been affected by the treatments, which can occur when the measurements on covariates are taken after instead of before the experiment, or the random allocation of treatments has been unfortunate in that some treatments are on units with generally low values of the covariates while others are on generally high ones. The covariance efficiency factor is analogous to the efficiency factor printed for non-orthogonal treatment terms; details of its derivation can be found in Payne & Tobias (1992).

For a residual line in the analysis of variance, the value in the “cov. ef.” column measures how much the covariates have improved the precision of the experiment. This is calculated by dividing the residual mean square in the unadjusted analysis (which excludes the covariates) by its value in the adjusted analysis.

The covariance efficiency factor is used by Genstat in the calculation of standard errors for tables of effects; if you want to calculate the net effect of the analysis of covariance on the precision of the estimated effects of a treatment term, you should multiply the covariance efficiency factor of the term by the value printed in the residual line of the stratum where the term is estimated. Where a term has more than one degree of freedom, the adjustment given by the covariance efficiency factor is an average over all the comparisons between the effects of the term. However this adjustment should not differ by much from those required for any particular comparison unless the randomization has been especially unfortunate. For a table of means classified by several factors, Genstat combines the covariance efficiency factors of the effects from which the means are calculated into a harmonic mean, weighted according to the numbers of degrees of freedom of each term.

The adjusted analysis-of-variance table has an extra line in each stratum, giving the sum of squares due to the covariates. This is the extra sum of squares that is removed by the covariates after eliminating all that can be explained by the treatments. It thus lets you assess whether there is any evidence that the covariates are required in the model. If there are several covariates Genstat will also print their individual contributions to that sum of squares, giving first the sum of squares that can be explained by the first covariate in the `COVARIATE` list, then the extra sum of squares that can be accounted for by fitting the second covariate, and so on. However, if some of the covariates were grouped together into a pointer in the `COVARIATE` list, their contributions will be pooled into a single line.

The line for each treatment term in the analysis-of-variance table contains the sum of squares eliminating the covariates. It indicates whether there is evidence of any effects of that term, after taking account of the differences in the values of the covariates on the units to which each treatment was applied.

The method that Genstat uses for analysis of covariance essentially reproduces the method that you would use if you were doing the calculations by hand. First of all, it analyses each covariate according to the block and treatment models. You can print information from these analyses using the `CPRINT` option of either `ANOVA` or `ADISPLAY`. As `ADISPLAY` does not constrain you to list save structures that were all produced by the same `ANOVA`, `CPRINT` will produce information about the covariate analyses from every save structure that you list; duplicate information will thus be produced if several of the save structures are for analyses involving the same covariates. The output from `CPRINT`, particularly the analysis-of-variance table, gives you another way of assessing the relationship between treatments and covariates: a large variance ratio for a treatment term in the analysis of one of the covariates would indicate either that the treatment had affected the covariate or that the randomization had been unfortunate (as discussed in the description of cov. ef. above).

Genstat then analyses each y-variate in turn. First of all it does the usual analysis ignoring the covariates. You can control output from this unadjusted analysis by the `UPRINT` option of `ANOVA` and `ADISPLAY`. (So the whole of the output given for the example could have been produced by a single `ANOVA` statement.) Then the covariates are fitted by linear regression and the full, adjusted, analysis is calculated. Output from the adjusted analysis is controlled by the `PRINT` option of `ANOVA` and `ADISPLAY`. This option has an extra setting, not available for `UPRINT` and `CPRINT`: `PRINT=covariates` prints the regression coefficients of the covariates as estimated in each stratum.

Options: none.

Parameter: unnamed.

### Reference

Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics, 19, 3-23.

Directives: `ANOVA`, `BLOCKSTRUCTURE`, `TREATMENTSTRUCTURE`, `ADISPLAY`, `AKEEP`.

Procedures: `AFCOVARIATES`, `ASTATUS`, `AUNBALANCED`.

Commands for: Analysis of variance.

### Example

```" Example ANOV-9: one-way analysis of covariance

Experiment to study the effect of two antibiotics (A and B)
and an inert control drug C on the treatment of leprosy.
Variate X is a score of the number of bacilli on each patient
before the experiment; variate Y is a similar score several
months after treatment."

UNITS [NVALUES=30]
FACTOR [LABELS=!T(A,B,C)] Drug
VARIATE X,Y
A 11  6    B  6  0    C 16 13
A  8  0    B  6  2    C 13 10
A  5  2    B  7  3    C 11 18
A 14  8    B  8  1    C  9  5
A 19 11    B 18 18    C 21 23
A  6  4    B  8  4    C 16 12
A 10 13    B 19 14    C 12  5
A  6  1    B  8  9    C 12 16
A 11  8    B  5  1    C  7  1
A  3  0    B 15  9    C 12 20  :
" One-way analysis with treatment factor Drug."
TREATMENTS Drug
" Covariates are incorporated into the model in each stratum by a linear
regression. This should explain some of the variability of the units
in the stratum, and so decrease the stratum residual mean square.
Each treatment will have been applied to units whose mean value for
the covariate differs from that of other treatment combinations; so
even in the absence of any treatment effects, the y-values recorded
for the different combinations would not be identical. A further effect
of the analysis is to adjust the treatment estimates for the covariates,
to correct for this. This adjustment causes some loss of efficiency
in the treatment estimation. The remaining efficiency is measured by
the covariance efficiency factor, shown for each treatment term in
the `cov. ef.' column of the aov table. The values are in the range
zero to one. A value of zero indicates that the treatment contrasts
are completely correlated with the covariates: after the covariates
have been fitted there is no information left about the treatments.
A value of one indicates that the covariates and the treatment term
are orthogonal. Usually the values will be around 0.8 to 0.9. A low
value should be taken as a warning: either the measurements used as
covariates have been affected by the treatments, which can occur when
the measurements on covariates are taken after instead of before the
experiment; or the random allocation of treatments has been unfortunate
in that some treatments are on units with generally low values of
the covariates while others are on generally high ones.
For a residual line in the analysis of variance, the value in the
`cov. ef.' column measures how much the covariates have improved
the precision of the experiment. This is calculated by dividing the
residual mean square in the adjusted analysis by its value in the
unadjusted analysis (which excludes the covariates)."
COVARIATE X
ANOVA Y

" The UPRINT option of ANOVA and ADISPLAY allows output to be printed
from the analysis unadjusted for covariates."