Different lines are used to predict Y and to predict X
The correlation coefficient is symmetric in the two variables — the correlation coefficient between X and Y is the same as that between Y and X. However the least squares line predicting Y from X is not the same as that used to predict X from Y (even after rearranging the equation).
If the scatterplot is drawn with the variable Y on the vertical axis, the least squares line for predicting Y from X,
y = b0 + b1 x
minimises the sum of squared vertical distances between the points on a scatterplot and the line. On the other hand, if we are interested in predicting X from Y using a line,
x = c0 + c1 y
the residuals are the horizontal distances between the points and the line in the same scatterplot, and least squares minimises the sum of squares of these.
Different lines minimise the sum of squares of horizontal and vertical distances.
About the two least squares lines
The two least squares lines can be simply written in terms of standardised variables,
Equation of least squares line to predict Y from X | ![]() |
---|---|
Equation of least squares line to predict X from Y | ![]() |
where r is the correlation coefficient between X and Y. Since r is always less than 1, the least squares line for predicting Y from X is the more horizontal (closer to being parallel to the x-axis) of the two lines.
Weights of brothers
The scatterplot below shows the adult weights (in kg) of 100 pairs of brothers aged between 20 and 30.
The line that is initially drawn on the scatterplot looks a reasonable fit to the data — it would predict each brother's weight to be equal to that of his brother. However this is not the least squares line for predicting the older brother's weight.
Click the checkbox Older brother under the scatterplot. This draws the residuals from using the line to predict the older brothers' weights from those of their younger brothers. The residual sum of squares is also shown under the scatterplot. Drag the red arrow on the scatterplot to rotate the line and observe that the residual sum of squares is minimised when the line is closer to horizontal.
Now turn off the checkbox Older brother and click the checkbox Younger brother. The residuals for predicting the younger brothers' weights from the older brothers are shown as horizontal lines and their sum of squares is displayed. Drag the line and observe that the residual sum of squares is minimised when the line is more vertical than before.