When to avoid using a least squares line for prediction

An important use of least squares lines is to predict the response if new observations are made at specific values of the explanatory variable. The least squares line should not be used to make predictions if some problems are evident in the underlying data.

The exercise below presents various data sets and prediction problems. You should identify which, if any, of the potential problems listed under the scatterplot should stop you using the least squares line for prediction.

Repeat this exercise with several different questions — they exhibit different problems.

Generic terms

The exercise below is similar but it expresses the four problems using standard statistical terms — outliers, nonlinearity, extrapolation and high-leverage points.

Again, identify any problems with using a least squares line for the data.

Again repeat with several questions.