Do the normal linear model assumptions hold?
Although a normal linear model is often used to describe how an explanatory variable, X, affects the distribution of a response, Y, it is not a suitable model for all bivariate data.
y = β0 + β1x + ε
ε ~ normal (0 , σ)
In particular, the following four requirements are implicit in the model but may be violated.
Linearity In some data sets, the response mean does not change linearly with X. The relationship is then called nonlinear. In the diagram on the right, the response levels off as X increases, so a normal linear model is not appropriate. |
![]() |
Constant standard deviation Sometimes the response standard deviation is different for different values of X. In the diagram on the right, the variability of the response is higher at large values of X. |
![]() |
Normal distribution for errors Sometimes distribution of the response (at any value of X) is skew or differs in shape from a normal distribution in other ways. In the diagram on the right, the response has a skew distribution with occasional very large values. |
![]() |
Independent errors All observations (and hence all errors) are assumed to be independently obtained. When the observations are ordered in time, successive errors may be correlated, with big values tending to be followed by others big values, etc. This is most commonly seen when the explanatory variable is time — i.e. when using a linear model to fit the trend in a time series. In the diagram on the right, crosses on one side of the least sqrs line are often followed by other crosses on the same side. |
![]() |
Residual plots
The above problems may be evident in a scatterplot of the raw data, but a residual plot often highlights any problems.
Examples
The first example below shows a data set that satisfies the assumptions for a normal linear model.
Observe that the plot of residuals against X is a horizontal band of constant width. (Click on any point to see how the residual relates to the plot of the raw data on the left.)
Select other data sets from the pop-up menu at the top. These are data sets for which different linear model assumptions are violated. Observe how the problems are reflected in the residual plots.
In the remainder of this section, we will look in more detail at the four assumptions underlying the normal linear model.