Examining residuals

As in regression models with a single explanatory variable, we look for problems in multiple regression models with diagnostics that are based on the residuals and leverages of the observations.

When there is only one explanatory variable, X, a scatterplot of Y against X usually shows up any problems with the regression model as effectively as a scatterplot of the residuals against X. However with more explanatory variables,

The residuals remove the effects of all explanatory variable, so plots of residuals against explanatory variables can show information about problems with the model that cannot be seen in plots of the raw data.

Artificial illustration

The diagram below shows an artificial data set that shows clearly how residual plots can highlight problems that are not easy to see in the raw data.

Use the lower pop-up menu to display scatterplots of Y against X and Z. Neither plot gives any clear indication of problems with the fit of the model — both variables seem to be linearly related to Y.

Now use the top pop-up menu to display scatterplots of the residuals against X and Z.

The scatterplot of residuals against Z clearly shows curvature — the model should use a nonlinear function of Z.

To help understand the difference between the plots of Y and the residuals against Z, display a scatterplot of Y against Z and click the checkbox Color crosses.

The colours of the crosses represent X — from red for low X to blue for high X. Y is related to X as well as Z, so there the vertical spread of crosses in the plot of Y against Z is mostly determined by X. When changing to a scatterplot of residuals against Z, the effect of X is removed.

The following example shows the use of residuals for a real data set.

Energy expenditure of bees

In an experiment, an entomologist recorded energy expenditure (joules/sec) for bees drinking water with different sucrose concentrations (%) and at different temperatures. Energy expenditure is the response measurement.

Scatterplots of Energy expenditure against Temperature and Sucrose do not show particular curvature, but the residual plots both give some evidence of nonlinearity in the relationship with both explanatory variables.