Comparing models

As in other situations with a hierarchy of possible models, an analysis of variance table is used to find the simplest model that is consistent with the data. This is based on the analysis of variance table for the model in which the factor is treated as categorical, but the sum of squares explained by the factor is split into sums of squares explained by linear and quadratic models, and by 'lack of fit' of the quadratic model.

As in other analysis of variance tables, mean sums of squares are calculated for each explained sum of squares and these are divided by the mean residual sum of squares to give F ratios. A p-value tests whether each F ratio is larger than would be expected by chance.

The method will be clearer with an example.

Antibiotic effectiveness

The anova table below initially treats the concentration of antibiotic as categorical (with 6 levels) so there is a single row for the explained variation (with 5 degrees of freedom).

Click Split ssq for factor to split the explained sum of squares into sums of squares for linear, quadratic and 'lack of fit' rows.

The p-values are interpreted as follows:

Interpretation

Lack-of-fit for quadratic model
Restricting the parameters of the full categorical model to lie on a quadratic curve increases the residual sum of squares by 0.236. This explained sum of squares (with 3 degrees of freedom) is associated with a p-value of 0.815, so we conclude that the quadratic model fits the data as well as the full categorical model — there is no evidence of lack-of-fit of the quadratic model.
Quadratic term
The residual sum of squares increases by a further 1.475 if the quadratic term is dropped leaving a linear model. The corresponding p-value is 0.019, so we should conclude that there is moderately strong evidence of curvature in the relationship. (Curvature is significant at the 5% level but not at the 1% level.)
Linear term
Since there is evidence of curvature in the relationship, there is little point in testing whether the linear term is necessary. However the explained sum of squares for the linear model, 54.266, is high and its p-value is 0.000, so the linear term is very important and explains most of the variability in the response.

We therefore conclude that a quadratic model fits data adequately.