What affects the accuracy of the least squares slope?
We gave a formula for the standard deviation of b1 earlier in this section. It can be rewritten as
where
It is interesting to observe how these three quantities influence the accuracy of the least squares slope as an estimate of β1.
The standard error of the least squares slope, b1, is lowest when:
The first two influences on accuracy are not surprising but the third needs a little more thought.
Demonstration
The diagram below shows the distribution of the least squares slope for samples from a normal linear model.
Use the pull-down menu to alter the sample size. Observe that the spread of the distribution of b1 is lowest when the sample size is large.
Change the sample size back to 20, then adjust the response standard deviation. Observe that the spread of the distribution of b1 is lowest when the response standard deviation is small.
Change the response standard deviation back to a medium value, then adjust the spread of X. Observe that the spread of the distribution of b1 is lowest when the spread of X is high.
(Click Accumulate then take a few samples at any combination of the three characteristics to verify that the blue normal distributions are indeed correct!)
Implications for experimental design
There are important consequences when designing experiments that will generate regression data. In order to increase the accuracy of the estimate of the least squares slope,
There is however a major problem when the spread of
x-values is increased too much.
Beware nonlinearity
Although many relationships are acceptably linear over a limited range of x-values, at extreme x-values the relationship often becomes nonlinear. Although a good spread of x-values is desirable, the normal linear model is not appropriate if there is curvature. A compromise is needed.
Even when you have decided on a range of x-values that will be used in the experiment, it is important to avoid using only values at the two ends of this range, even though this maximises sx. Without intermediate values, it is impossible to assess whether the data are linear or not.