Looking for outliers and influence

Detection of outliers is clearly important when fitting multiple regression models. They may correspond to incorrectly recorded values or individuals that are different enough from the rest to warrant separate analysis.

High leverage is also important, even though it does not necessarily mean that anything is wrong with the multiple regression model. A high-leverage point could potentially have undue influence on the conclusions and we should be aware if our conclusions hang on a single observation.

In simple linear models, outliers, high leverage and influential points can usually be seen in a scatterplot of the raw data, so the complication of evaluating and plotting externally studentised residuals, leverages and measures of influence is less important. However when there are two explanatory variables, it is much harder to simply 'look at the raw data' and this is impossible if there are three or more explanatory variables.

Examination of externally studentised residuals, leverages and a measure of influence is therefore important. These may be shown on a dot plot or in a scatterplot against fitted values.

Examples

For each of the following data sets, externally studentised residuals, leverages and DFITS are shown in stacked dot plots. Carefully examine the plots, clicking on crosses to see corresponding values of the other measures, then read the conclusions.

Since there are only two explanatory variables, you can click Peek at data to see the raw data in a 3-dimensional scatterplot. For models with more than two explanatory variables that you will meet in later chapters, this will not be possible.