Estimating mean reduction in blood pressure
A medical researcher is interested in the effect of a particular drug on blood pressure. Twenty patients with high blood pressure are given different doses of the drug, X, and their change in blood pressure is recorded over a 2-hour period, Y.
The researcher would be interested in estimating the mean reduction in blood pressure after different doses of the drug,
μy = β0 + β1x
using the least squares estimate,
= b0 + b1 x
Since both b0 and b1 become less variable (and hence more accurate estimates of β0 and β1) as the sample size increases,
The estimate of the mean reduction in blood pressure also becomes increasingly accurate as the sample size increases.
Predicting a single individual's reduction in blood pressure
In contrast, the researcher might want to predict the change in blood pressure for a single patient who has been given a dose of X. The same prediction would be used as above,
= b0 + b1 x
However, no matter how accurately we estimate the mean reduction in blood pressure, a single individual will also have a distribution with standard deviation σ around this mean. As a result, the errors in predicting the blood pressure reduction for a single person will be greater.
The distribution of the prediction error cannot have a standard deviation that is less than σ.
Difference between estimating a mean and predicting a new value
We will perform a simulation from a normal linear model with β0 = 3.3 and β1 = 0.75. Data from the model will be used to estimate the mean response when X = 5.5 and also to estimate a new individual's response value at this x-value. The same value is used both for estimation and prediction,
= b0 + b1 x
but the error is different in the two situations.
The true mean response is 7.43. (We can evaluate this since we know the values of β0 and β1 in the simulation — in practice we would not be able to determine the mean response.) The top half of the diagram shows the error in estimating this from the least squares line.
The bottom half of the diagram shows the error from predicting a new response value at X = 5.5.
Click Accumulate then take several samples from this linear model. Observe that the prediction error has greater spread than the estimation error at the top.
Use the pop-up menu to increase the sample size to 210. Observe that the error in estimating the mean becomes very small, but the prediction error is still quite large. Although we can estimate the mean response accurately, we have no information about how far the new value will be from this.
(In practice, it would be unwise to estimate or predict at X = 5.5 since the highest x-values in the data are about 4 — we are not sure that the relationship will remain linear at high X. However it makes the diagram above clearer.)