Advantages of high leverage

Selecting a new observation at a point, (xz), with high leverage should give good improvement in the accuracy of the least squares parameter estimates, but only if the new observation conforms to the same linear model as the rest of the data.

Problems with high leverage

We are rarely sure that the relationship is linear for all (xz). Even though linearity is often reasonable in a restricted range of (xz), this may not be true at high leverage points on the 'outskirts' of the data.

Even more seriously, it is always possible that an observation may have been incorrectly recorded or the individual may have different characteristics from the bulk of the data — it could be an outlier. An observation at a high leverage point has the potential to strongly influence the results if it is also an outlier.

Unfortunately, outliers often do not show up as residuals that are unusually far from zero. Because a high leverage point pulls the least squares plane so much, the outlier may have a residual that has similar size to the other residuals.

Do not rely on the residuals to show up an outlier if it is also a high leverage point.


Illustration

Rotate the 3-dimensional scatterplot by dragging the centre with the mouse (but avoiding the red arrow.) All points lie near to a plane, so a normal linear model is appropriate.

Click the y-x-z button, then drag the cross in the centre of the diagram to make it an outlier. Observe that the residual plot will show this point as an extreme residual.

Select High leverage from the pop-up menu then drag the cross on the right. Because it is a high leverage point, it strongly pulls the least squares plane towards itself, so its residual does not stand out much from the other residuals, even though it is an outlier.

Rule-of-thumb for high leverage points

It can be shown that the leverages of all data points sum to p, where p is the number of linear parameters in the model (with p = 3 for the linear model with two explanatory variables), so their average value is p/n. A rule-of-thumb is therefore to call a point 'high leverage' if its leverage is more than double this value.

Examine carefully points with hii >  2p/n.

Note that high leverage does not mean that there is anything wrong with an observation — only that its explanatory variables are 'unusual' in some way that could unduely influence the results.