Detecting problems with the model

If outliers or curvature are present in a data set, they are often visible in a scatterplot of the response against the explanatory variable. However these features are usually clearer if the residuals are plotted against X rather than the original response.

Plot residuals against X to look for problems in the model


Gross Domestic Product in the USA

The scatterplot below shows how GDP has increased in the USA since 1960. The GDP has been measured in billion US dollars, corrected for inflation by expressing it in terms of 2006 dollars. We will use a linear model to try to describe how GDP has increased over time.

The relationship seems reasonably linear until a least squares line is drawn on the scatterplot of the raw data. The curvature in the relationship is highlighted in the residual plot on the right.

A simple linear model under-estimates GDP between 1974 and 1998 (the residuals are negative) and over-estimates GDP at both ends of the series (positive residuals). Due to the under-estimation in recent years, using the linear model to predict into the future would be ill-advised.

We will examine solutions to the problem of nonlinearity in the next section.