Long page
descriptions

Chapter 5   Anova Theory (Advanced)

5.1   Distribution of variance

5.1.1   Distribution of Z-squared

The square of a standard normal variable has a chi-squared distribution with 1 degree of freedom.

5.1.2   Sums of squares

The sum of n squared standard normal variables has a chi-squared distribution with n d.f.

5.1.3   Sum of squares about sample mean

Differences between values and the population mean can be written as the sum of two components and their sums of squares satisfy a similar relationship. This shows that the sum of squares about the sample mean is less than or equal to that about the population mean.

5.1.4   Sums of squares tables

The sum of squares about the population mean can be split in different ways into component sums of squares with chi-squared distributions. The sum of squares about the sample mean has (n-1) degrees of freedom and its mean sum of squares is the sample variance.

5.1.5   Ratio of variances and F distribution

The ratio of two independent sample variances (or mean sums of squares) has an F distribution whose degrees of freedom are those of the two variances.

5.1.6   Overview of analysis of variance

The sum of squares about the sample mean can often be further split into component sums of squares. Comparison of the corresponding mean sums of squares can be used to test whether the model underlying the data has certain characteristics.

5.1.7   Summary of anova distributions

This page summarises the most important results from the section.

5.2   Inference for variances (optional)

5.2.1   Confidence interval for the variance

A 95% CI for the population variance can be found from the chi-squared distn with (n-1) degrees of freedom.

5.2.2   Properties of the confidence interval

As with other 95% CIs, there is 95% probability that a confidence interval for the variance will include the underlying population variance.

5.2.3   Warning about CI for variance

The confidence level for the 95% CI is only accurate if the sample comes from a normal population. The CI should therefore be avoided unless you are sure about the shape of the population distribution.

5.2.4   Independence of mean and variance ((optional))

For random samples from a normal distribution, the sample mean and variance are independent.

5.2.5   Model and hypotheses

For data that that arise as samples from normal distributions in both groups, we tested earlier whether the group means were the same. Equality of the group variances can also be examined.

5.2.6   Test statistic

The ratio of the two sample variances has an F distribution whose shape depends on the sample sizes in the two groups.

5.2.7   F test

To test equality of two variances, the F ratio is compared to an F distribution. The test is 2-tailed and the p-value is twice the smaller tail area.

5.3   Anova in simple settings (optional)

5.3.1   Different approach to testing mean

The two component sums of squares can be used to test the value of the population mean. The ratio of the mean sums of squares has an F distribution if the null hypothesis holds.

5.3.2   P-value for F test

The p-value for the test is the upper tail area of the F distribution.

5.3.3   Equivalence of F and t tests

The F test based on the anova table results in the same p-value and conclusion as a t test for the hypotheses.

5.3.4   Components sums of squares for two groups

Components can be defined whose sums of squares hold information about the difference between the group means, the variability within group 1 and the variability within group 2.

5.3.5   Testing for equal group variance

The sums of squares of the two within-group components lead to the same F test that was described in an earlier section for whether the group variances are equal.

5.4   Simple linear models

5.4.1   Distributions of sums of squares

The residual sum of squares has a chi-squared distribution with (n - 2) d.f. The explained sum of squares only has a chi-squared distribution (1 d.f.) if Y is unrelated to X -- its distribution has a higher mean otherwise.

5.4.2   F ratio

The ratio of the mean explained and mean residual sums of squares has an F distribution with (1, n-2) d.f. if Y is unrelated to X. The F ratio is expected to be higher if the variables are related.

5.4.3   Analysis of variance test

The F ratio can be used to test whether the variables are related (i.e. to test whether the model slope is zero).

5.5   Sums of squares for other models

5.5.1   Sums of squares and degrees of freedom

This page gives general results about sums of squares, their degrees of freedom and their distributions, for any sequence of models.

5.5.2   Distributions of sums of squares

In a linear model with 2 numerical explanatory variables, the residual sums of squares has a chi-squared distribution with (n-3) degrees of freedom. The explained regression sum of squares has a chi-squared distribution with 2 degrees of freedom if the response is unrelated to the explanatory variables.

5.5.3   Distribution of quadratic ssq

In a quadratic model, the quadratic sum of squares has a chi-squared distribution if the true model is linear, but its distribution has a larger mean if there is curvature.

5.5.4   Theory behind anova test

The between-group and within-group sums of squares have chi-squared distributions and the anova F ratio has an F distribution if both group means are equal. The p-value for the anova test is the tail area of this distribution.

5.5.5   Theory behind anova test (advanced)

The sums of squares again have chi-squared distributions but with different degrees of freedom.