Confidence intervals
We showed on the previous page how the standard error for each least squares coefficient can be found. It is easy to translate these estimates and standard errors into 95% confidence intervals.
The t-value is looked up with the same number of degrees of freedom as the
residual sum of squares — the number of observations, n, minus the
number of parameter in ,
p.
If the degrees of freedom are high, the 95% CI
is approx
bi ± 2 se(bi)
Hypothesis tests for single parameters
In a similar way, we can perform a hypothesis test for whether individual parameters in the model are zero.
The test asks whether the corresponding explanatory variable can be dropped from the full model.
The test statistic is found by standardising the estimate,
The p-value is the probability of getting a test statistic this far from zero. It is found from the tail area of the t distribution with (n - p) degrees of freedom.
The p-values are interpreted in the usual way as the strength of evidence against the null hypothesis.
Warning
Each of these tests only assesses whether you can drop a single explanatory variable from the full model. After dropping one variable from the full model, the p-values for the other variables will change and they may no longer be unimportant.
If several explanatory variables have high p-values, this does not give evidence that you can simultaneously drop all variables from the model.
When the explanatory variables are correlated with each other (multicollinearity), they may be in some way hold the same information about variability in the response — dropping any one variable may not matter since the others hold the same information, but you should not drop all of them.
This effect was described more fully for linear models with two explanatory variables.
Body fat
The table below shows the least squares estimates and their standard deviations for the body fat data. Since the residual sum of squares has over 200 degrees of freedom, 95% confidence intervals for the individual parameters are approximately (estimate ± 2 s.e.) and are not displayed in the table.
The table does show the t-statistics for testing whether the individual parameters are zero, and the corresponding p-values.
Several p-values are higher than 0.1, giving evidence that these variables could be dropped from the full model. However this does not mean that we could drop all such variables simultaneously.
Only abdomen is highly significant in the full model — all other variables have p-values of 0.01 or higher.
The diagram has a checkbox for each explanatory variable, allowing it to be dropped from the model. The p-value for knee is highest, so click its checkbox to remove it from the model. Observe that the other p-values change.
Continue deleting variables with the highest p-values until all remaining variables have p-values below 0.05. Observe that
In the model with only weight, abdomen, forearm and wrist, all variables have p-values under 0.01 — there is very strong evidence that these variables are important in this model.