Two tests for nonlinearity

We have now described two different approaches to testing a regression relationship for linearity. Both asked whether extending the linear model with a model that allowed for curvature gave an improvement in the model's fit.

Quadratic model
This test asked whether a quadratic model explained significantly more response variability than a linear model.
Treating X as categorical
The test in this section asked whether significantly more response variability is explained by a categorical variable whose levels are the distinct x-valuse than a linear model.

We now consider the differences between these two approaches when both are possible — i.e. for data sets with multiple response values at some or all x-values.

'Smooth' curvature

We firstly consider situations where the curvature is fairly smooth over the range of x-values.

If there is fairly smooth curvature in the relationship, the quadratic test is more likely to detect it than the test involving a factor model.


Simulation

The diagram below selects samples from a normal model in which the response mean varies with the mean in a fairly smooth curve. (The grey rectangles stretch two standard deviations on each side of the model mean at each x-value.)

Click Take sample a few times to select data sets from this model. Observe that the green p-value for the quadratic test is usually closer to zero than the corresponding p-value for the factor model.

The quadratic test is more likely to detect the nonlinearity.


Other kinds of nonlinearity

The reason for the lower power of the factor-model based test is that it is testing for any nonlinear pattern in the means, not just a smooth pattern.

If the model nonlinearity is not close to a smooth quadratic relationship, the quadratic test is less likely to detect the linearity but the factor-model based test has the same power as for smooth curvature.

For 'non-smooth' relationships, the test involving a factor model may be more likely to detect the nonlinearity.

Testing for bad experimental design

This type of 'non-smooth' pattern in the means often arises in experiments that have not been properly randomised. If the repetitions of the experiment at each value of the explanatory variable have been conducted in blocks, then systematic differences between these blocks (runs of the experiment) can also result in group means that do not vary linearly.

The test for linearity based on a factor model also tests for differences between these blocks of experiments.


Simulation

The diagram below again selects samples from a normal model. In this case, the model means do not vary smoothly. Two possible reasons for this pattern are:

The factor-model based test is sensitive to both of these problems with the linear model.

Click Take sample a few times to select data sets from this model. Observe that the green p-value for the factor-model based test is more likely than the quadratic test to detect this problem with the linear model — its p-value is usually closer to zero.