Variability of the least squares plane

Estimating the regression model parameters by least squares tries to make the residuals as small as possible, so the least squares plane is positioned as close as possible to the data points. Least squares does not care where the least squares plane is located away from the data.

When the explanatory variables are multicollinear, this means that the position of the least square plane can be very variable (i.e. the model's regression plane is inaccurately estimated) at the 'corners of the x-z plane' that are distant from the data.

Simulation

In the diagram below, data are sampled from a linear regression model where the explanatory variables, X and Z, are multicollinear.

Click Accumulate then take 10 to 20 samples. Rotate the diagram to see the variability of the least squares plane.

Click Show Min Variability to rotate the diagram to 'looking down' the scatter of data points. From this direction, it can be seen that the least squares planes look rather like flapping wings — they are much less variable near the data than in the corners corresponding to (high X, low Z) and (low X, high Z).

Which explanatory variable is more important?

When the two explanatory variables are multicollinear, it is much harder to distinguish which of the two is really related to the response — each variable can act as a kind of proxy for the other. Different combination of the two variables can have very similar exlanatory power.

Heart catheter length

In heart catheterisation, a catheter is passed through a major vein or artery from the leg into the heart. X-rays are used to position the tip of the catheter. The catheter length (cm), height (in) and weight (lb) were recorded from 12 children.

Drag one of the vertical arrows to move the least squares plane and observe the effect on the residual sum of squares. Click Least squares then repeat with the other vertical arrow.

Observe that dragging the corner that is furthest from the data has much less effect on the residual sum of squares so the position of this corner is much less accurately determined.

When dragging this corner, observe that as the coefficient of height increases, the coefficient of weight decreases. It is clear that at least one of these variables affects catheter length, but there is less information about which is important.

We could set the coefficient of either height or weight to zero and increases to the other coefficient could largely compensate.