The square of a standard normal variable has a chi-squared distribution with 1 degree of freedom.
The sum of n squared standard normal variables has a chi-squared distribution with n d.f.
Differences between values and the population mean can be written as the sum of two components and their sums of squares satisfy a similar relationship. This shows that the sum of squares about the sample mean is less than or equal to that about the population mean.
The sum of squares about the population mean can be split in different ways into component sums of squares with chi-squared distributions. The sum of squares about the sample mean has (n-1) degrees of freedom and its mean sum of squares is the sample variance.
The ratio of two independent sample variances (or mean sums of squares) has an F distribution whose degrees of freedom are those of the two variances.
The sum of squares about the sample mean can often be further split into component sums of squares. Comparison of the corresponding mean sums of squares can be used to test whether the model underlying the data has certain characteristics.
This page summarises the most important results from the section.
A 95% CI for the population variance can be found from the chi-squared distn with (n-1) degrees of freedom.
As with other 95% CIs, there is 95% probability that a confidence interval for the variance will include the underlying population variance.
The confidence level for the 95% CI is only accurate if the sample comes from a normal population. The CI should therefore be avoided unless you are sure about the shape of the population distribution.
For random samples from a normal distribution, the sample mean and variance are independent.
For data that that arise as samples from normal distributions in both groups, we tested earlier whether the group means were the same. Equality of the group variances can also be examined.
The ratio of the two sample variances has an F distribution whose shape depends on the sample sizes in the two groups.
To test equality of two variances, the F ratio is compared to an F distribution. The test is 2-tailed and the p-value is twice the smaller tail area.
The two component sums of squares can be used to test the value of the population mean. The ratio of the mean sums of squares has an F distribution if the null hypothesis holds.
The p-value for the test is the upper tail area of the F distribution.
The F test based on the anova table results in the same p-value and conclusion as a t test for the hypotheses.
Components can be defined whose sums of squares hold information about the difference between the group means, the variability within group 1 and the variability within group 2.
The sums of squares of the two within-group components lead to the same F test that was described in an earlier section for whether the group variances are equal.
The residual sum of squares has a chi-squared distribution with (n - 2) d.f. The explained sum of squares only has a chi-squared distribution (1 d.f.) if Y is unrelated to X -- its distribution has a higher mean otherwise.
The ratio of the mean explained and mean residual sums of squares has an F distribution with (1, n-2) d.f. if Y is unrelated to X. The F ratio is expected to be higher if the variables are related.
The F ratio can be used to test whether the variables are related (i.e. to test whether the model slope is zero).
This page gives general results about sums of squares, their degrees of freedom and their distributions, for any sequence of models.
In a linear model with 2 numerical explanatory variables, the residual sums of squares has a chi-squared distribution with (n-3) degrees of freedom. The explained regression sum of squares has a chi-squared distribution with 2 degrees of freedom if the response is unrelated to the explanatory variables.
In a quadratic model, the quadratic sum of squares has a chi-squared distribution if the true model is linear, but its distribution has a larger mean if there is curvature.
The between-group and within-group sums of squares have chi-squared distributions and the anova F ratio has an F distribution if both group means are equal. The p-value for the anova test is the tail area of this distribution.
The sums of squares again have chi-squared distributions but with different degrees of freedom.