Positioning the line to give small residuals

The residuals from a linear model are the vertical distances from the crosses to the line on a scatterplot. They indicate how closely predictions from the line (the fitted values) match the actual responses in the data.

Therefore positioning the line to make the residuals small improves the model's fit to the data. The parameters b0 and b1 of the linear model should be set to make the residuals as small as possible.

In the diagram below, the residuals for all observations are displayed as vertical blue lines.

Drag the red arrows on the diagram to adjust the intercept and slope of the line to make the residuals small.

Least squares: minimising the sum of squares of residuals

To turn this into an objective procedure, we must define what is meant by 'making the residuals small'. The most useful overall measure of the size of the residuals is the residual sum of squares,

The values of b0 and b1 that minimise the residual sum of squares are called the least squares estimates of the parameters and the method is called the method of least squares.

In the diagram below, the squared residuals are represented by the areas of the boxes at each data point. Drag the red arrows to position the line in such a way that the residual sum of squares (the total area of the boxes) is as small as you can achieve.

After trying by hand, click the button Least squares to let the computer determine the best values.

Repeat this exercise with the data sets Example 2 and Example 3. In the third data set, note that the line may be positioned to make five of the six residuals very small, but the squared 6th residual is then very large. The least squares line has several moderately large residuals.

The problem of minimising the residual sum of squares is not difficult mathematically, but you will rarely require or use the resulting formulae for b0 and b1 since spreadsheets, statistical programs and even scientific calculators will do the calculations for you. However, for completeness, the formulae are