Normal Linear Model
Although linearity and constant variance are the most important assumptions that
are needed for the inference reported by Minitab to be reliable, we also need two more:
- Normal distribution
- The response should have a normal distribution at each value of the explanatory
variable. This is an assumption about the conditional distribution
of the response at each x-value not the marginal distribution. In the context
of the slug data, this assumption means that the log-weights of the 5 cm
slugs should have a normal distribution. Similarly, the log-weights of the
1 cm slugs should be normally distributed, etc.
- Independence
- All observations must be independently obtained. Independence is a function of how the
data were collected. It is most often violated when the observations are collected in
sequence — outside influences may affect a series of adjacent values, making them correlated.
The slug data were collected in sequence over 4 years, so there is potential for this
assumption to be violated if slugs in one year (or season) are markedly different from those
at other times.
When all four assumptions hold — linearity, constant variance, normality
and independence — the model is called a normal linear model.
(Section 1.4 of the CAST e-book about regression contains more detail about normal linear regression models.)