1. Home
  2. COVARIATE directive

COVARIATE directive

Specifies covariates for use in subsequent ANOVA statements.

No options

Parameter

    variates or pointers Covariates

Description

To perform analysis of covariance you need to define the treatment model (using TREATMENTSTRUCTURE) and the underlying structure of the design (using BLOCKSTRUCTURE) as in ordinary analysis of variance, and then simply specify the required covariates using the COVARIATE directive. You can then do the analysis by ANOVA, get further output by ADISPLAY and so on, in the usual way.

In the simplest form of the COVARIATE directive, its (unnamed) parameter just contains a list of the variates that are to be used as covariates. Alternatively, you can group some of the variates into pointers. The analysis-of-variance table will then contain a line for each group instead of the individual covariates in that group (see below).

You can use covariates to incorporate any quantitative information about the units into the model. In field experiments there may often be linear trends in fertility. These can be estimated and removed by fitting a covariate of the position of the plot along the direction of the trend. For example

COVARIATE Location

For a quadratic trend, you would also include a covariate containing the squares of the positions.

CALCULATE Quadtrend = Location**2

COVARIATE Location,Quadtrend

In experiments on animals, you may wish to use measurements such as the original weight. However the assumption is always that the y-variate is linearly related to the covariates.

Covariates are incorporated into the model as terms for a linear regression. Genstat fits the covariates, together with the treatments, in each stratum. This should explain some of the variability of the units in the stratum, and so decrease the stratum residual mean square.

Each treatment combination will have been applied to units whose mean value for each covariate differs from that of other treatment combinations; so even in the absence of any treatment effects, the y-values recorded for the different combinations would not be identical. A further effect of the analysis is to adjust the treatment estimates for the covariates, to correct for this. This adjustment causes some loss of efficiency in the treatment estimation. The remaining efficiency is measured by the covariance efficiency factor, shown for each treatment term in the “cov. ef.” column of the analysis-of-variance table. The values are in the range zero to one. A value of zero indicates that the treatment contrasts are completely correlated with the covariates: after the covariates have been fitted there is no information left about the treatments. A value of one indicates that the covariates and the treatment term are orthogonal. Usually the values will be around 0.8 to 0.9. A low value should be taken as a warning: either the measurements used as covariates have been affected by the treatments, which can occur when the measurements on covariates are taken after instead of before the experiment, or the random allocation of treatments has been unfortunate in that some treatments are on units with generally low values of the covariates while others are on generally high ones. The covariance efficiency factor is analogous to the efficiency factor printed for non-orthogonal treatment terms; details of its derivation can be found in Payne & Tobias (1992).

For a residual line in the analysis of variance, the value in the “cov. ef.” column measures how much the covariates have improved the precision of the experiment. This is calculated by dividing the residual mean square in the unadjusted analysis (which excludes the covariates) by its value in the adjusted analysis.

The covariance efficiency factor is used by Genstat in the calculation of standard errors for tables of effects; if you want to calculate the net effect of the analysis of covariance on the precision of the estimated effects of a treatment term, you should multiply the covariance efficiency factor of the term by the value printed in the residual line of the stratum where the term is estimated. Where a term has more than one degree of freedom, the adjustment given by the covariance efficiency factor is an average over all the comparisons between the effects of the term. However this adjustment should not differ by much from those required for any particular comparison unless the randomization has been especially unfortunate. For a table of means classified by several factors, Genstat combines the covariance efficiency factors of the effects from which the means are calculated into a harmonic mean, weighted according to the numbers of degrees of freedom of each term.

The adjusted analysis-of-variance table has an extra line in each stratum, giving the sum of squares due to the covariates. This is the extra sum of squares that is removed by the covariates after eliminating all that can be explained by the treatments. It thus lets you assess whether there is any evidence that the covariates are required in the model. If there are several covariates Genstat will also print their individual contributions to that sum of squares, giving first the sum of squares that can be explained by the first covariate in the COVARIATE list, then the extra sum of squares that can be accounted for by fitting the second covariate, and so on. However, if some of the covariates were grouped together into a pointer in the COVARIATE list, their contributions will be pooled into a single line.

The line for each treatment term in the analysis-of-variance table contains the sum of squares eliminating the covariates. It indicates whether there is evidence of any effects of that term, after taking account of the differences in the values of the covariates on the units to which each treatment was applied.

The method that Genstat uses for analysis of covariance essentially reproduces the method that you would use if you were doing the calculations by hand. First of all, it analyses each covariate according to the block and treatment models. You can print information from these analyses using the CPRINT option of either ANOVA or ADISPLAY. As ADISPLAY does not constrain you to list save structures that were all produced by the same ANOVA, CPRINT will produce information about the covariate analyses from every save structure that you list; duplicate information will thus be produced if several of the save structures are for analyses involving the same covariates. The output from CPRINT, particularly the analysis-of-variance table, gives you another way of assessing the relationship between treatments and covariates: a large variance ratio for a treatment term in the analysis of one of the covariates would indicate either that the treatment had affected the covariate or that the randomization had been unfortunate (as discussed in the description of cov. ef. above).

Genstat then analyses each y-variate in turn. First of all it does the usual analysis ignoring the covariates. You can control output from this unadjusted analysis by the UPRINT option of ANOVA and ADISPLAY. (So the whole of the output given for the example could have been produced by a single ANOVA statement.) Then the covariates are fitted by linear regression and the full, adjusted, analysis is calculated. Output from the adjusted analysis is controlled by the PRINT option of ANOVA and ADISPLAY. This option has an extra setting, not available for UPRINT and CPRINT: PRINT=covariates prints the regression coefficients of the covariates as estimated in each stratum.

Options: none.

Parameter: unnamed.

Reference

Payne, R.W. & Tobias, R.D. (1992). General balance, combination of information and the analysis of covariance. Scandinavian Journal of Statistics, 19, 3-23.

See also

Directives: ANOVA, BLOCKSTRUCTURE, TREATMENTSTRUCTURE, ADISPLAY, AKEEP.

Procedures: AFCOVARIATES, ASTATUS, AUNBALANCED.

Commands for: Analysis of variance.

Example

" Example ANOV-9: one-way analysis of covariance

  Experiment to study the effect of two antibiotics (A and B)
  and an inert control drug C on the treatment of leprosy.
  Variate X is a score of the number of bacilli on each patient
  before the experiment; variate Y is a similar score several
  months after treatment."

UNITS [NVALUES=30]
FACTOR [LABELS=!T(A,B,C)] Drug
VARIATE X,Y
READ Drug,X,Y; FREPRESENTATION=labels
A 11  6    B  6  0    C 16 13
A  8  0    B  6  2    C 13 10
A  5  2    B  7  3    C 11 18
A 14  8    B  8  1    C  9  5
A 19 11    B 18 18    C 21 23
A  6  4    B  8  4    C 16 12
A 10 13    B 19 14    C 12  5
A  6  1    B  8  9    C 12 16
A 11  8    B  5  1    C  7  1
A  3  0    B 15  9    C 12 20  :
" One-way analysis with treatment factor Drug."
TREATMENTS Drug
" Covariates are incorporated into the model in each stratum by a linear
  regression. This should explain some of the variability of the units 
  in the stratum, and so decrease the stratum residual mean square. 
  Each treatment will have been applied to units whose mean value for 
  the covariate differs from that of other treatment combinations; so 
  even in the absence of any treatment effects, the y-values recorded 
  for the different combinations would not be identical. A further effect 
  of the analysis is to adjust the treatment estimates for the covariates, 
  to correct for this. This adjustment causes some loss of efficiency 
  in the treatment estimation. The remaining efficiency is measured by 
  the covariance efficiency factor, shown for each treatment term in 
  the `cov. ef.' column of the aov table. The values are in the range 
  zero to one. A value of zero indicates that the treatment contrasts 
  are completely correlated with the covariates: after the covariates 
  have been fitted there is no information left about the treatments. 
  A value of one indicates that the covariates and the treatment term 
  are orthogonal. Usually the values will be around 0.8 to 0.9. A low 
  value should be taken as a warning: either the measurements used as 
  covariates have been affected by the treatments, which can occur when 
  the measurements on covariates are taken after instead of before the
  experiment; or the random allocation of treatments has been unfortunate 
  in that some treatments are on units with generally low values of 
  the covariates while others are on generally high ones.
  For a residual line in the analysis of variance, the value in the 
  `cov. ef.' column measures how much the covariates have improved 
  the precision of the experiment. This is calculated by dividing the 
  residual mean square in the adjusted analysis by its value in the 
  unadjusted analysis (which excludes the covariates)."
COVARIATE X
ANOVA Y

" The UPRINT option of ANOVA and ADISPLAY allows output to be printed 
  from the analysis unadjusted for covariates."
ADISPLAY [UPRINT=aov]

" The CPRINT option produces output from an analysis of variance of the
  covariates."
ADISPLAY [CPRINT=aov]
Updated on March 8, 2019

Was this article helpful?