Multicollinearity complicates the interpretation of parameter estimates and hypothesis tests about linear models in ways that will be explained in later pages in this section. It is therefore worth first spending some time trying to understand how multicollinearity affects the information about the two slope parameters in regression data.
Meaning of the slope parameters
In the regression model
the slope parameter β1 describes how X affects the mean response when Z is held fixed. This is the slope of any slice through the regression plane parallel to the x-axis (i.e. at any fixed value of Z).
The plane below represents a linear model. The vertical blue plane shows a slice through this plane at a z-value that can be changed with the slider.
The 2-dimensional display at the bottom of the diagram shows this slice — how the mean response depends on X at this fixed z-value. Note that the slope of this slice is β1 = 0.10.
Information about the slope from multicollinear data
In a similar way, information about β1 from a regression data set comes from the slices through the data at different z-values. When X and Z are highly correlated, each such slice only contains a small spread of x-values and therefore holds much less information about β1 than similar slices where X and Z are uncorrelated.
As a result, β1 cannot be estimated as accurately when the data are multicollinear.
Example
The diagram below shows a data set with 200 observations. The data points within a small range of z-values are highlighted in red. In the jittered dot plot on the top right, drag up/down the red band to change the slice of points that is highlighted. Click the x-y-z rotation button (or drag to rotate manually) to also display the x-values in the data set.
The scatterplot at the bottom of the diagram shows how Y is related to X in each slice of points. (Remember that the parameter β1 is the model slope in all such slices.)
Use the slider to make X and Z uncorrelated then vary the slices through the data. Observe that most slices strongly suggest that Y and X are related — that the parameter β1 is not zero.
Finally drag the slider to the right to make X and Z multicollinear (correlation 0.99) and repeat. The slices now each contain a small range of x-values and do not strongly suggest that Y and X are related.
When the data are multicollinear, there is therefore much less information about the value of β1.