Linearity within a g-group model

The model that allows for arbitrary group means at the g distinct x-values can be parameterised with a constant term and (g-1) 'explanatory' variables. The first of these is a linear term and the others are indicator variables that describe how far the means of the 3rd, 4th, etc. groups are from a linear function passing through the first two means.

This parameterisation of the factor model is most easily explained graphically.

Numerical explanatory variable with 4 levels

Consider an experiment with 3 response measurements at each of 4 equally-spaced values for a numerical explanatory variable. We will define a model in which the explanatory variable is treated as defining 4 groups.

In the following parameterisation of the model, the first variable is a linear term. The model with only this term would be a conventional simple regression model.

The two further indicator variables give additional flexibility to the model that allows arbitrary means for all groups.

The diagram below illustrates the meanings of the parameters for an artificial set of 12 response values.

Observe that:

This full model is therefore equivalent to the earlier models that we used for a categorical explanatory variable.

Test for nonlinearity

Using this parameterisation, we can test for linearity in an anova table with the sequential sum of squares for the indicator variables (after the linear term). A large sum of squares (and therefore a small associated p-value) gives evidence against linearity.

Lettuce and nitrogen fertiliser

An experiment was conducted to test the effects of nitrogen fertiliser on lettuce production. Five rates of ammonium nitrate were used in a completely randomised design with four replicates (plots). The data are the number of heads of lettuce harvested from each plot.

The anova table below shows the sequential sums of squares for the linear term and for the 3 indicator variables that are required to allow the 3rd, 4th and 5th group means to be off the line joining the 1st and 2nd levels.

The p-value for the indicator variables, 0.0611, is for the test of nonlinearity. We therefore conclude from this test that there is only mild evidence of nonlinearity in the data.

The sequential sums of squares are sums of squared differences between the fitted values from different models, so they can be illustrated graphically

Select Linear component from the pop-up menu. The linear sum of squares is the sum of squared differences between the overall mean (fitted value for a model with no explanatory variables) and the fitted values from the linear model.

Select Nonlinear component. The nonlinear sum of squares is the sum of squared differences between the linear model and the model that treats the explanatory variable as g groups.

Finding the sums of squares in practice

Many statistical programs have options to automatically find the nonlinear (also called lack of fit) and residual (also called pure error) sums of squares and perform a goodness-of-fit test for linearity.

If such software is not available, the sums of squares can be easily obtained by fitting two models, one of which is a standard linear model that treats x as a numerical explanatory variable, and the other of which treats x as defining g groups (one for each distinct x-value).

The nonlinear sum of squares is the difference between the explained sums of squares for these two models. The anova table can therefore be created from the results of fitting these two models.