Diagnostic residual plots
The reasonableness of the assumptions underlying a normal linear model are best assessed by examining the residuals. Linearity and constant variance can usually be seen in a scatterplot of residuals against the explanatory variable.
If the assumptions of normality and independence hold, the residuals should have approximately a normal distribution and they should display no trend over time. Time-trend can be examined with a scatterplot of the residuals against either time or, if the exact data-collection time is unavailable, an evenly spaced time index (i.e. the values 1, 2, ..., n).
Normality is often examined with a probability plot. This is a scatterplot of the ordered residuals against a set of values that are spaced as would be expected from a standard normal distribution. If the residuals are normally distributed, the crosses should lie approximately on a straight line.
(Page 1.8.5 of the CAST e-book about regression gives more detail about how probability plots are used to detect non-normality.)
The diagram below shows four diagnostic plots for the slug data relating to a linear model of log(weight) against log(length).
The first two are scatterplots of the raw data and residuals against the explanatory variable in the model. They show no evidence nonlinearity or non-constant variance.
The plot of residuals against time order does not show any prominent pattern other than a cluster of high residuals around observation 50 — these slugs were unusually heavy for their lengths. Click on the crosses to check the weights and lengths of this cluster of slugs. Are they similarly sized slugs? If possible, it would be worth checking whether these slugs were all collected at the same time.
The probability plot on the bottom right shows curvature, suggesting that the residuals are not normally distributed. The flat section of the plot shows a high density of residuals just below zero, so the response distribution seems skew with a long positive tail. There are also two negative residuals that stand out as possible outliers.
Are Features in the Residual Plots Real?
It is important not to rush to conclusions about features in graphical displays of data. Random data may exhibit patterns even when the underlying population does not. So are the patterns that we have observed in the time-ordered plot and probability plot really indicative of problems with the model or are they just random patterns?
To help answer this question, we can examine the patterns that would arise from data that do come from a normal linear model. Select the option Random Normal Data from the popup menu then click Take sample several times.
The pattern observed in the time-order plot is perhaps not unusual in view of the variability of the corresponding plots from random data. However the curvature in the probability plot is rarely observed in random samples, so we could conclude that the residuals probably do not have normal distribution.
(Minitab can perform a hypothesis test to confirm what we observed in the above simulation. The residuals must be saved in a worksheet column. A Minitab command draws a normal probability plot and also provides a p-value for testing normality. The p-value for the Anderson-Darling normality test is reported to be '0.000' so there is extremely strong evidence that the null hypothesis of normality does not hold.)
Violation of the assumption of normality has less impact on the validity of the Minitab analysis than other model problems, and the violation is not severe, so the earlier analysis is not particularly compromised by the problem identified in this page.