Estimating mean strength
In the concrete-testing example, measurements of concrete strength were recorded after various curing periods. A civil engineer would be interested in estimating the mean strength of concrete after different drying periods,
μy = β0 + β1x
using the least squares estimate,
= b0 + b1 x
Since both b0 and b1 become less variable (and hence more accurate estimates of β0 and β1) as the sample size increases,
The estimate of the mean strength also becomes increasingly accurate as the sample size increases.
Predicting a single item's strength
In contrast, the engineer might want to predict the strength of a single concrete item that has been left to cure for X days. The same prediction would be used as above,
= b0 + b1 x
However, no matter how accurately we estimate the mean strength of such items, the single item will also have a distribution with standard deviation σ around this mean. As a result, the errors in predicting the strength of a single item will be greater.
The distribution of the prediction error cannot have a standard deviation that is less than σ.
Difference between estimating a mean and predicting a new value
We will perform a simulation from a normal linear model with β0 = 3.3 and β1 = 0.75. Data from the model will be used to estimate the mean response when X = 5.5 and also to estimate a new individual's response value at this x-value. The same value is used both for estimation and prediction,
= b0 + b1 x
but the error is different in the two situations.
The true mean response is 7.43. (We can evaluate this since we know the values of β0 and β1 in the simulation — in practice we would not be able to determine the mean response.) The top half of the diagram shows the error in estimating this from the least squares line.
The bottom half of the diagram shows the error from predicting a new response value at X = 5.5.
Click Accumulate then take several samples from this linear model. Observe that the prediction error has greater spread than the estimation error at the top.
Use the pop-up menu to increase the sample size to 210. Observe that the error in estimating the mean becomes very small, but the prediction error is still quite large. Although we can estimate the mean response accurately, we have no information about how far the new value will be from this.
(In practice, it would be unwise to estimate or predict at X = 5.5 since the highest x-values in the data are about 4 — we are not sure that the relationship will remain linear at high X. However it makes the diagram above clearer.)