Adding a quadratic term

An alternative solution to the problem of curvature is to extend the simple linear model with the addition of a quadratic term,

y  =  b0  +  b1 x  +  b2 x2

Fitted values and residuals are defined (and interpreted) in a similar way to those for a linear model,

  =  b0  +  b1 xi  +  b1 xi2
ei  =  yi

As in a linear model, the quadratic model's residuals are the vertical distances between the crosses in a scatterplot and the curve. We again use least squares to estimate the unknown parameters — choose values of the three parameters to minimise the residual sum of squares,

The scatterplot below shows a data set with a nonlinear relationship.

Drag the three red arrows to adjust the position of the quadratic curve. Observe that ...

Click the checkbox Show residuals. The residuals are displayed with blue vertical lines on the scatterplot. Adjust the coefficients using the red arrows to make the residuals as small as possible, then click the button Least squares to minimise the residual sum of squares.

If a relationship is nonlinear, residuals from a quadratic model are likely to be smaller than those for a linear model. This also suggests that the errors are likely to be smaller if the model is used to predict future response values.

The next example illustrates the potential improvement from using a quadratic model.

GDP in the USA

We tried earlier to use a linear model to explain how the Gross Domestic Product (GDP) in the USA changed between 1960 and 2013. Initially we only look at the data up to 2007. The curvature in the scatterplot of residuals below shows that a linear model is not appropriate.

Choose the option Quadratic fit from the pop-up menu. The least squares quadratic curve is displayed at the top, and the residual plot at the bottom displays the residuals from the quadratic model. Note that ...

Click the checkbox Extend to 2013 to add the American GDP from 2008 to 2013 to the plot. Although the model still captures most of the curvature in the data, the most recent years are not modelled as well by a quadratic model and the residuals after 2000 are considerably larger.