Alternative models for numerical factor
In the previous section, we described three different models that could be used to explain the effect of a numerical factor. These models, plus a model in which the factor has no effect on the response, are shown below.
Unknown parameters | Number of unknown parameters |
|
---|---|---|
Factor does not affect the response yij = µ + εij |
µ | 1 |
Linear model yij = β0 + β1 xi + εij |
β0 and β1 | 2 |
Quadratic model yij = β0 + β1 xi + β2 xi2 + εij |
β0 , β1 and β2 | 3 |
Factor treated as categorical yij = µi + εij |
µ1 , µ2 , ..., µg | g |
With a numerical factor, we usually anticipate that its relationship with the response will be 'smooth. Advantages of linear and quadratic models are that:
However the curvature in some relationships cannot be modelled with linear or quadratic models.
Is a linear or quadratic model adequate to describe the relationship?
Hierarchy of models
The four models in the table above form a hierarchy of increasing flexibility. The most general model treats the factor as categorical and has the flexibility to model any shape of relationship between the factor and response. The linear model is the most restrictive of the models in which the factor affects the response since it constrains the mean responses at the different levels to lie on a straight line. The quadratic model is intermediate.
|
generalise to term µi | ![]() |
![]() |
add (g - 1) parameters |
|
Residual and explained sums of squares
When fitting a model to experimental data, we estimate the parameters to give the minimum possible residual sum of squares — they are the 'least squares' estimates.
As the number of parameters increases, improving the flexibility of the model, the more it is possible to reduce the residual sum of squares. Each addition of parameters in the hierarchy above therefore results in a decrease in the residual sum of squares by an amount that is called the sum of squares explained by adding the factor. These explained sums of squares have degrees of freedom equal to the number of extra parameters.
Hardness of ball bearings
The hardness of steel ball bearings is related to the rate X at which they were cooled after being made. Data were obtained from an experiment where two ball bearings were cooled at each of several rates of cooling.
The table below initially shows the residual sum of squares for the model in which the cooling rate has no effect on the response variable (hardness). In this model, none of the variation in the response is explained by the model.
Drag the red arrow down past Linear to see the effect on the residual sum of squares of changing the model to one that models the effect of cooling rate linearly. The residual sum of squares reduces by 1029.15 and this is displayed in the table as the sum of squares explained by the linear model.
Drag past Quadratic to add a quadratic term, further reducing the residual sum of squares (with a further 1 explained degree of freedom).
Finally drag past Factor to move to the most general model that treats the cooling rate levels as categorical. This model has 12 more parameters than the quadratic model and does not impose any form of smoothness on the relationship between cooling rate and hardness. The corresponding reduction in the residual sum of squares, 26.84, can be interpreted as being caused by lack of fit of the quadratic model.
Note that the sums of squares and their degrees of freedom add up to give the total sum of squares and its degrees of freedom, (n - 1).