Estimating mean volume of timber
In the timber volume example, the timber volume obtained from harvested trees was related to the cross-sectional area at chest height. The manager of a forest would be interested in estimating the mean timber volume that could be obtained from trees with different cross-sectional areas,
μy = β0 + β1x
using the least squares estimate,
= b0 + b1 x
Since both b0 and b1 become less variable (and hence more accurate estimates of β0 and β1) as the sample size increases,
The estimate of the mean timber volume also becomes increasingly accurate as the sample size increases.
Predicting timber volume from a single tree
In contrast, the manager might want to predict the timber volume that could be obtained from a single tree that has cross-sectional area X ft2. The same prediction would be used as above,
= b0 + b1 x
However, no matter how accurately we estimate the mean volume from trees with this cross-sectional area, the single tree will also have a distribution with standard deviation σ around this mean. As a result, the errors in predicting the volume from a single tree will be greater.
The distribution of the prediction error cannot have a standard deviation that is less than σ.
Difference between estimating a mean and predicting a new value
We will perform a simulation from a normal linear model with β0 = 3.3 and β1 = 0.75. Data from the model will be used to estimate the mean response when X = 5.5 and also to estimate a new individual's response value at this x-value. The same value is used both for estimation and prediction,
= b0 + b1 x
but the error is different in the two situations.
The true mean response is 7.43. (We can evaluate this since we know the values of β0 and β1 in the simulation — in practice we would not be able to determine the mean response.) The top half of the diagram shows the error in estimating this from the least squares line.
The bottom half of the diagram shows the error from predicting a new response value at X = 5.5.
Click Accumulate then take several samples from this linear model. Observe that the prediction error has greater spread than the estimation error at the top.
Use the pop-up menu to increase the sample size to 210. Observe that the error in estimating the mean becomes very small, but the prediction error is still quite large. Although we can estimate the mean response accurately, we have no information about how far the new value will be from this.
(In practice, it would be unwise to estimate or predict at X = 5.5 since the highest x-values in the data are about 4 — we are not sure that the relationship will remain linear at high X. However it makes the diagram above clearer.)