What affects the accuracy of the least squares slope?

We gave a formula for the standard deviation of b1 earlier in this section. It can be rewritten as

where

It is interesting to observe how these three quantities influence the accuracy of the least squares slope as an estimate of β1.

The standard error of the least squares slope, b1, is lowest when:

  1. the response standard deviation, σ, is low
  2. the sample size, n, is large
  3. the spread of x-values is high

The first two influences on accuracy are not surprising but the third needs a little more thought.

  1. If the data are close to the regression line, then the position of the line can be accurately determined.
  2. Increasing the amount of data increases the information about β1.
  3. It is easiest to understand the effect of sx by considering the most extreme situation. If all x-values are the same (sx = 0), the distribution of Y will also be the same for each and there will be no information about β1. If the x-values are similar, the errors will tend to mask any differences in the mean response and β1 will be poorly estimated. The greater the spread of x-values, the smaller the errors relative to the difference in response means, so the greater the accuracy of the least squares estimates.

Demonstration

The diagram below shows the distribution of the least squares slope for samples from a normal linear model.

Use the pull-down menu to alter the sample size. Observe that the spread of the distribution of b1 is lowest when the sample size is large.

Change the sample size back to 20, then adjust the response standard deviation. Observe that the spread of the distribution of b1 is lowest when the response standard deviation is small.

Change the response standard deviation back to a medium value, then adjust the spread of X. Observe that the spread of the distribution of b1 is lowest when the spread of X is high.

(Click Accumulate then take a few samples at any combination of the three characteristics to verify that the blue normal distributions are indeed correct!)

Implications for experimental design

There are important consequences when designing experiments that will generate regression data. In order to increase the accuracy of the estimate of the least squares slope,

Reduce the response standard deviation, σ
As in other experimental situations, it is best if the experimental units are as similar as possible. This is the only way to keep σ low.
Increase the sample size
Clearly it is best to collect as much data as possible, but expense will be a limiting factor.
Increase the spread of x-values
In an experiment, the x-values are under the control of the experimenter, so it is possible to increase their spread.  (If there is too small a variation in the x-values used in the experiment, β1 cannot be accurately estimated.)

There is however a major problem when the spread of x-values is increased too much.
Beware nonlinearity

Although many relationships are acceptably linear over a limited range of x-values, at extreme x-values the relationship often becomes nonlinear. Although a good spread of x-values is desirable, the normal linear model is not appropriate if there is curvature. A compromise is needed.

Even when you have decided on a range of x-values that will be used in the experiment, it is important to avoid using only values at the two ends of this range, even though this maximises sx. Without intermediate values, it is impossible to assess whether the data are linear or not.