Observational studies and experiments

We have shown that orthogonal explanatory variables result in the most accurate estimates of the slope parameters and also that the resulting parameter estimates and anova table are easier to interpret.

In an observational study, the values of X and Z are usually correlated — we have no control over their values so it is unlikely that their correlation will be exactly zero, even if it is sometimes low.

In an experiment however, the researcher can often choose the values of the explanatory variables. Good experimental design selects the values of X and Z to make them orthogonal.

Simple orthogonal design

The simplest way to ensure that the explanatory variables are uncorrelated is to use only a small number of different values for each of X and Z, then to make an equal number of response measurements at each possible combination of levels of these variables.

Examples

In an experiment, an entomologist recorded energy expenditure (joules/sec) for bees drinking water with different sucrose concentrations (percentage) and at different temperatures. The researcher decided to conduct the experiment using three different temperatures (20, 30 and 40 degrees C) and three different sucrose concentrations (20, 40 and 60%).

By measuring energy expenditure from the same number of bees at each combination of temperature and sucrose (three replicates), the two explanatory variables were guaranteed to be orthogonal.

    Temperature (degrees C)
    20 30 40
Sucrose
concentration
(%)
20 3.1, 3.7, 4.7 6.0, 6.9, 7.5 7.7, 8.3, 9.5
40 5.5, 6.7, 7.3 11.5, 12.9, 13.4 15.7, 14.3, 15.9
60 7.9, 9.2, 9.3 17.5, 15.8, 14.7 19.1, 18.0, 19.9

The diagram below shows an analysis of these data.

Note that there is only a single anova table, and the two p-values can be used to test whether the corresponding variables can be dropped from the full model.

Use the pop-up menu to see another experimental data set and its analysis.

Other orthogonal designs

It is not essential to record the same number of response observations at every combination of X and Z for the explanatory variables to be orthogonal although there is rarely any reason to choose other orthogonal designs.

X and Z are orthogonal if the spread of observations over Z is the same for each level of X.

Other orthogonal designs for bee energy expenditure

The following designs are all orthogonal.

Equal replicates
This is the design that was used in the actual experiment whose results were shown above. The number of replicates for each combination of temperature and sucrose are shown in the table below.
    Temperature
    20 30 40
Sucrose 20 3 3 3
40 3 3 3
60 3 3 3
Different replicates for one variable
If the researcher particularly wanted to estimate energy expenditure at 20 percent sucrose, more replicates might be undertaken at this level of sucrose:
    Temperature
    20 30 40
Sucrose 20 5 5 5
40 2 2 2
60 2 2 2
Different replicates for both variables
It is possible to take extra observations at levels of both X and Z and keep the design orthogonal. For example, the following design takes extra observations at both 20 degrees and at sucrose concentration 40 percent.
    Temperature
    20 30 40
Sucrose 20 4 2 2
40 6 3 3
60 4 2 2

Advantage of equal replicates

The standard error of each regression slope estimate depends on the standard deviation of the corresponding explanatory variable (among other things),

eqn

eqn

where r is the correlation coefficient between X and Z. When comparing orthogonal designs (with r = 0), the standard errors of the slopes are lowest when the standard deviations of X and Z are highest. Concentrating the design on one or two levels of X or Z decreases their standard deviations, so:

A design with equal replicates is usually best.