What affects the accuracy of a prediction?
Since the predicted response at X,
= b0 + b1 x
depends on the least squares estimates, b0 and b1, it also varies from sample to sample. The prediction has a normal distribution whose mean is
μy = β0 + β1x
The standard deviation of the prediction describes its likely distance from this underlying population value. It depends on:
Predictions are least variable (most accurate) when predicting at an x-value near the mean of the 'training' data.
The diagram below shows a sample from a normal linear model and the least squares line that is fitted to these data.
Click Accumulate, then take approximately 20 further samples. The variability of the least squares lines is shown on the right.
Now drag the slider on the right to expand the scales in the diagrams. Observe that the least squares lines (and hence the predictions that are made from them) are least variable near the centre of the data, but become increasingly variable as you extrapolate from the data.
The next diagram concentrates on the errors that result from using the estimate
= b0 + b1 x
of the true mean response,
μy = β0 + β1x
Click Accumulate, then take about 50 further samples. The jittered dot plot on the right shows the distribution of the errors that are obtained when using a least squares line to estimate the mean response at X.
(Click on any cross in this plot to see the data set that gave rise to it.)
Drag the slider to observe the distribution of the errors at other x-values. Observe that the errors are least variable when predicting near x = 2.5.
Finally, click the checkbox below to display the theoretical distribution of the errors and again drag the slider to adjust the value of X.