Comparing models with the residual sum of squares
The parameters of general linear models (GLMs) are chosen to minimise the sum of squared residuals — by least squares. The residual sum of squares is also a good way to compare alternative models.
For two models with the same number of parameters, the model with the smaller residual sum of squares is better.
Unfortunately, the residual sum of squares cannot be used to directly compare models with different numbers of parameters. Removing explanatory variables from a model cannot decrease the residual sum of squares and almost always increases it. (It is theoretically possible for the residual sum of squares to stay the same, but this never happens in practice.)
Even if an explanatory variable is not related to the response, dropping it from the model increases the residual sum of squares.
Identifying the least important variable
The decreases in the residual sum of squares that would occur from dropping the individual variables from a model are called the marginal sums of squares for the variables. They are also called Type 3 sums of squares.
Since all such models have the same number of parameters (the number for the full model minus one), the best explanatory variable to drop from the full model is the one with smallest marginal sum of squares. The relative sizes of these sums of squares is closely related to the p-values for testing whether the coefficients are zero.
The variable with smallest marginal sum of squares also has the highest p-value.
If a variable is to be removed from the full model, we should choose the variable with smallest marginal sum of squares and highest p-value. Both the p-value and the marginal sum of squares are helpful when deciding whether this is advisable.
Body fat
In this example, 13 explanatory variables were used to model percentage body fat. The diagram below shows the least squares coefficients, p-values for testing whether the parameters are zero and the Type 3(marginal) sums of squares.
Dropping any explanatory variable from the model would increase the residual sum of squares — the Type 3 sums of squares give the increase and are all positive.
Click the checkboxes against the explanatory variables one-at-a-time to remove them from the model. Observe that all 12-variable models have higher residual sums of squares than the full 13-variable model and that the Type 3 sums of squares give the increases from removing the variables.
Since the variable knee has smallest Type 3 sums of squares, the model without knee has the smallest residual sum of squares of the models with 12 explanatory variables.
Since the p-value for knee is well over 0.1 (it is the highest of the p-values), we conclude that there is no evidence that this variable is needed.
You might also note that the p-values and Type 3 sums of squares for the other variables change when a variable is removed from the model — they now describe the effect of deleting variables from the 12-variable model, not the full 13-variable model. The differences are highest when important variables such as abdomen are taken out of the model.