In regression data, the difference between the response and its overall mean can be split into an explained component and a residual.
The total sum of squares equals the explained sum of squares plus the residual sum of squares.
The relative sizes of the explained and residual sums of squares holds information about the strength of the relationship. The coefficient of determination describes the proportion of total variation that is explained.
For experimental data, the coefficient of determination is affected both by the strength of the relationship and the range of x-values chosen by the experimenter.
The F ratio can be used to test whether the variables are related (i.e. to test whether the model slope is zero). Since the F ratio is the square of the t statistic for this test, the conclusions for the F and t tests are identical.
The difference between each response value and the overall mean can be split into a component explained by the explanatory variables and a residual.
The total, regression and residual sum of squares contain information about how well the explanatory variables explain variability in the response. The coefficient of determination is a useful summary statistic.
The ratio of the mean regression and residual sums of squares has an F distribution if the response is unrelated to the explanatory variables but is larger if they are related. It can be used as a test statistic for whether there is a relationship.
A similar F test can simultaneously test whether all slope parameters in a GLM are zero.
The coefficient of determination, R-sqr, describes the proportion of response variation that is explained by the model. The F ratio describes the strength of evidence for there being any relationship at all. In large samples, R-sqr can be small even when F is large.
When the explanatory variables, X and Z, are correlated, their slope parameters can be estimated less accurately than for uncorrelated explanatory variables covering the same spreads of x- and z-values.
The variance inflation factors for the slope parameters quantify the increase in their standard errors due to the explanatory variables being correlated.
The slope coefficient for X is the slope of a slice through the regression plane at any z-value. When X and Z are highly correlated, similar slices of the data contain small ranges of x-values and therefore hold little information about the parameter.
The position of the least squares plane is most accurately determined near the data. When X and Z are highly correlated, the LS plane can be very variable away from the data.
If X and Z are correlated, the F-test can show that the explanatory variables are related to Y but t-tests of the separate slopes may show that either one of X or Z can be dropped from the full model.
If X and Z are correlated, the t-test for X in the full model with X and Z can give a different result from the t-test in the model with only the single explanatory variable X.
As explanatory variables are added to the model, the regression plane gets closer to the data points. The regression planes corresponding to models with only X or Z correspond to planes that have zero slope for the other variable.
Each additional variable reduces the residual sum of squares by an amount that is the sum of squares of differences between the least squares fits of the two models.
The explained sum of squares for X can be different, depending on whether Z is already in the model.
There are two ways to split the total sum of squares in an anova table. The F-test for the final variable added to the model gives identical results to the t-test for the coefficient in the full model.
When the two explanatory variables are uncorrelated (orthogonal), the results are easier to interpret. The slope coefficients for X are the same, whether or not Z is in the model, and the two anova tables are identical.
Orthogonal variables usually only arise from designed experiments. They result in the most accurate parameter estimates and results that are relatively easy to interpret.
For any sequence of models with increasing complexity, component sums of squares can be defined that compare successive models in the sequence.
Linearity can be assessed by comparing the fits of a linear and quadratic model. The total sum of squares can be split into linear, quadratic and residual sums of squares.
The quadratic sum of squares compares the fit of a linear and quadratic model and therefore holds information about whether there is curvature in the data.
An F ratio comparing the quadratic and residual mean sums of squares can be used to test for linearity.
In polynomial models, only one order of adding terms is meaningful. This means that only a single anova table is possible.
Analysis of variance can test the significance of the reduction in the residual sums of squares from adding quadratic terms in X and Z to a model with linear terms in both variables.
Testing whether the coefficient of XZ is zero can be done with either a t-test or analysis of variance. Both tests give the same p-value.
The marginal sums of squares in a general linear model describe the effect on the residual sum of squares of deleting single variables from the full model.
The variable with smallest marginal sum of squares is least important and its p-value indicates whether it can be dropped from the model. The marginal sums of squares can be recalculated and further variables dropped in an iterative procedure.
When the explanatory variables are uncorrelated, parameter estimates and marginal sums of squares are unaffected by removing other variables. Variance inflation factors indicate the degree of multicollinearity.
Sequential sums of squares describe changes to the residual sum of squares when the explanatory variables are added sequentially. The sums of squares depend on the order of adding the variables.
The sequential sums of squares are also the sum of squared differences between the fitted values of consecutive models. In some applications, these differences can be shown graphically to illustrate the sequential sum of squares.
The sequential sums of squares depend on the order of adding the variables except with the explanatory variables are uncorrelated.
Individual explanatory variables can be grouped together by adding their sums of squares and degrees of freedom.
The sum of squares table can be extended with mean sums of squares and F ratios. P-values can be found for the F ratios to indicate whether each variable can be dropped from the model, but these should only be interpreted when subsequent p-values are not significant.