Leverage and influence

High-leverage points have the potential to strongly influence the conclusions from a data set.

Leverage is only a function of the explanatory variable, and we also need to take into account the response values for any high-leverage points to see whether they are actually influential.

Examples

The two diagrams below show data sets with one high-leverage point.

Click Delete high leverage points to delete the point with high leverage from both data sets. Observe that deleting the point...

Measuring influence

We have described a point's 'influence' as its effect on the conclusions that are reached from the data set. To obtain a numerical description of influence, we can describe how each point affects:

(It would also be possible to describe the influence of each point on the residual sum of squares or other statistics, but the above two are the most common ones.)

Influence on fitted values

In this page, we consider how deletion of the i'th point affects the fitted value at the same point,


Illustration

The scatterplot on the left below shows a dataset with one high-leverage point. The changes in the fitted values are plotted against x on the right.

Click on any point to see how the least squares line changes when it is deleted. The change in that point's fitted value from its deletion is shown in red and ploted on the right against x.

DFITS

Since the fitted values do not all have the same variance,

log(wt) = log(a) + p log(len)

we adjust the difference by dividing by the square root of an estimate (using the deleted standard deviation),

log(wt) = log(a) + p log(len)

(Note that this is not equivalent to standardisation since we use the standard devation of the fitted value, not of the difference.)

Guidelines

Observations are often classified as 'influential' if,

log(wt) = log(a) + p log(len)

Note that this is not a hypothesis test and plotting DFITS against x or the fitted values is recommended to help assess whether one or two points have excessive influence.

It can be proved that DFITS is a simple function of the externally studentised residual and the leverage,

log(wt) = log(a) + p log(len)

Therefore standardising DFITS simply gives the externallly studentised deleted residuals.