If you don't want to print now,

Chapter 11   Comparing Groups

11.1   Models for two groups

11.1.1   Interest in underlying population

Data from two groups

When data are collected from two groups, we are usually interested in differences between the groups in general. The specific individuals are of less interest. Questions are therefore about the characteristics of the populations or processes that we assume underlie the data.

Example

The questions do not refer to the 16 specific subjects — they ask about whether anticipation of hypnosis affects the ventilation rate in general. We would like to use the answers to predict what will happen to other people.

11.1.2   Model for two groups

Data and model

Data from two groups can be displayed with two histograms:

The diagram below illustrates a possible model for the data above.

11.1.3   Parameters of the normal model

Parameters

A normal model for two groups has four unknown parameters (the mean and standard deviation for each normal distribution). These parameters give considerable flexibility and allow the model to be used for a variety of different data sets.

(The number of parameters can be reduced to three if it is assumed that the two standard deviations are the same, but we will not consider this type of model here.)

11.1.4   Parameter estimates

Parameter estimates

A normal model for 2-group data involves 4 unknown parameters, µ1, µ2, σ1 and σ2. The means and standard deviations in the two samples provide objective estimates of the four parameters.

11.1.5   Difference between means

Comparing the populations

Although standard deviations in the two populations may also differ, we are usually most interested in the difference between the population means. Differences between the means can be expressed in terms of the model parameters with the following questions.

Randomness of sample difference

These questions are about µ2 - µ1 and the best estimate of it is . However, cannot give definitive answers since it is random — it varies from sample to sample.

Without an understanding of the distribution of , it is impossible to properly interpret what the sample difference, 0.104 kg, tells you about the difference between the underlying population means.

11.2   Distn of sums and differences

11.2.1   Means and sums of samples

Sample mean and sum

The mean of a random sample, , has a distribution that is approximately normal if the sample size, n, is large and alway has a mean and standard deviation that depend on the population mean, µ, and standard deviation, σ,

 =  μ
 = 

Occasionally the sum of values in a random sample values is more useful than the mean,

Its distribution is a scaled version of the distribution of the mean — the same shape but different mean and standard deviation.

Mean vs Sum

As the sample size increases,

11.2.2   Sum and difference

Sum and difference of two variables

Applying the result about the sum of a random sample to a sample of size n = 2, X1 and X2,

If we generalise by allowing X1 and X2 to have different means, µ1 and µ2, but the same σ,

A similar result holds for the difference between X1 and X2:

If X1 and X2 are independent and have normal distributions, their sum and difference are also normally distributed.

11.2.3   Sum and difference (cont)

General result

The results generalise further to independent variables that may have different means and standard deviations.

The formulae for the standard deviations are more easily remembered in terms of the variances of the quantities. For example,

11.2.4   Probabilities for sums and differences

Finding probabilities

To find the probability that a sum or difference satisfies an inequality, the inequality should be translated into ones about a z-score, using the mean and standard deviation of the quantity,

The standard normal distribution can then be used to find the probabilities. The examples below illustrate the method.

Example (total of several variables)

Example (sum of two variables with different sd)

11.3   Comparing means in two groups

11.3.1   Distn of difference between means

Difference between means

The difference between any two independent quantities X1 and X2 has a distribution with

Applying this to the difference between the means of two random samples,

If the distributions are normal in each group, ...
... the sample means are normal, so their difference also has a normal distribution.
Otherwise, ...
... the two sample means are approximately normal if the sample sizes are large, so their difference is also close to normal.

Irrespective of the distributions within the two groups,

11.3.2   SE of difference between means

Estimation error

The difference between the sample means, , is a point estimate of the difference between the means of the underlying populations, µ2 - µ1. In order to properly interpret it, we must understand the distribution of the estimation error.

Replacing σ12 and σ22 by s12 and s22 gives an approximate error distribution,

The standard deviation of these errors is the standard error of the estimator.

Examples

Our best estimate is that anticipation of hypnosis results in a mean ventilation rate that is 0.491 higher than the control group. From the error distribution, the error in this estimate is unlikely to be more than about 0.6.

11.3.3   CI for difference between means

If σ1 and σ2 were known...

Prob (  is within   ±  1.96     of   μ2 - μ1)   =   0.95

so a 95% confidence interval for µ2 - µ1 would be

  ±   1.96  

When σ1 and σ2 are unknown...

We must replace σ1 and σ2 by s1 and s2 in the confidence interval, and the constant '1.96' must be replaced by a slightly larger value from t-tables,

where the degrees of freedom for the t-value are

ν   =   min (n1−1,  n2−1)

(A more complex formula is available that gives a higher value for ν. It is slightly better but the difference is usually negligible.)

Example

11.3.4   Testing a hypothesis

Testing for a difference between two means

The difference between two groups that is of most practical importance is a difference between their means.

H0 :   μ2μ1  =  0
HA :   μ2μ1  ≠  0

The summary statistic that throws most light on these hypotheses is the difference between the sample means, . Testing therefore involves assessment of whether this difference is unusually far from zero.

As with all other hypothesis tests, a p-value near zero gives evidence that the null hypothesis does not hold — evidence of a difference between the group means.

Example

General properties of p-values

A statistical hypothesis test cannot provide a definitive answer about whether two groups have different means. The randomness of sample data means that p-values are also random quantities.

It is possible to get a small p-value (supporting HA) when H0 is true, and it is possible to get a large p-value (consistent with H0) when HA is true.

There is some chance of being misled by an 'unlucky sample.

If H0 is true
All p-values between 0 and 1 are equally likely. For example, there is a 5% probability of getting a p-value less than 0.05.
If HA is true
The p-value is more likely to be near zero, though there is still some chance of a larger p-value.

Effect of increasing the sample size

If H0 is true
The p-values remain equally likely between 0 and 1.
If HA is true
The distribution of p-values becomes more concentrated near zero, so you are more likely to conclude that the population means are really different.

11.3.5   One-tailed tests for differences

One- and two-tailed tests for differences

In a two-tailed test, the alternative hypothesis is that the two population means are different. A one-tailed test arises when we want to test whether one mean is higher than the other (or lower than the other).

Test statistic, p-value and conclusion

Consider a test for the hypotheses,

H0 :   μ1  =  μ2
HA :   μ1  >  μ2

The alternative hypothesis is only supported by very small values of . This also corresponds to small values of the test statistic t , so the p-value is the lower tail probability of the t distribution.

A small p-value is interpreted as giving evidence that H0 is false, in a similar way to all other kinds of hypothesis test.

Examples

Properties of p-values

We again stress that a statistical hypothesis test cannot provide a definitive answer. The randomness of sample data means that p-values are also random quantities, so there is some chance of us being misled by an 'unlucky' sample:

11.4   Comparing two proportions

11.4.1   Modelling two proportions

Two groups of successes and failures

We now consider data that are obtained as random samples from two populations, with the sampled individuals being categorised into successes and failures.

Since our model involves only two parameters, π1 and π2, the two groups are the same only if π2 - π1 = 0. The value of π2 - π1 is usually unknown but can be estimated by p2 - p1. However p2 - p1 is a random quantity so its variability must be taken into account when interpreting its value.

Example

Note that the questions do not refer to the specific 141 births in the study. They ask about differences between winter and summer births 'in general'.

We are interested in π2 - π1 rather than p2 - p1, so we need to understand the accuracy of our point estimate.

11.4.2   Distribution of difference in proportions

Difference between two proportions

Within each group, the sample proportion of successes, p, has a distribution that is approximately normal in large samples and has mean and standard deviation

Applying the general results about the difference between two independent random quantities:

Since the individual proportions are approximately normal (in large samples), their difference is also approximately normal:

11.4.3   CI for difference in proportions

Standard error of p2 - p1

The standard deviation of p2 - p1 is also its standard error when it is used to estimate π2 - π1,

In practice, π1 and π2 must be replaced by their sample equivalents to estimate the standard error.

Confidence interval for difference

Most 95% confidence intervals are of the form

estimate   ±   1.96 × se(estimate)

perhaps with a refinement of using a slightly higher value than 1.96 (e.g. a t-value) if the standard error is estimated. Applying this to our estimate of π2 - π1and using 2 instead of 1.96 gives the approximate 95% confidence interval

Example

11.4.4   Testing for difference in probabilities

Two-tailed test

H0 :   π1  =  π2
HA :   π1  ≠  π2

For this test, the steps involved in obtaining a p-value are:

The p-value is interpreted in the same way as for all previous tests. A p-value close to zero is unlikely when H0 is true, but is more likely when HA holds. Small p-values therefore provide evidence of a difference between the population probabilities.

One-tailed test

In a 1-tailed test, the alternative hypothesis is

HA :   π1  −  π2  >  0    or    HA :   π1  −  π2  <  0

The test statistic is identical to that for a 2-tailed test and the p-value is obtained in a similar way, but it is found from only a single tail of the standard normal distribution.

Alternative test statistic

Since π1 and π2 are equal if H0 is true, the overall proportion of successes, p, can be used in the formula for the standard error of p2 - p1.

This refinement makes little difference in practice, so the examples below use the 'simpler' formula that we gave earlier.

Two-tailed example

One-tailed example

11.5   Paired t test

11.5.1   Paired data

Paired data

When two types measurements, X and Y, are made from each individual (or other unit), the data are called bivariate. Sometimes the two measurements are of closely related quantities and may even describe the same quantity at different times.

When the sum or difference of X and Y is a meaningful quantity, the data are called paired data.

Hypotheses of interest

For paired data, We often want to test whether the means of the two variables are equal,

H0 :   μX = μY
HA :   μXμY

Sometimes a one-tailed test is required, such as

H0 :   μX = μY
HA :   μX > μY

Examples

Pre-test, post-test data
This arises when a measurement is made from each individual, then a second measurement of the same type is made after some kind of intervention (e.g. training or medication). Has the intervention "improved" the measurement?
Twin studies
Some experiments or other studies are conducted with identical twins, either human or animal. The members of each pair experience different environments — either two different experimental treatments or two other differences. Are there differences between the two treatments?
Other types of pairing
For example, damaged cars may each be taken to two garages for estimates of the cost of repair. The two estimates for each car are paired data. Does one garage overcharge?

11.5.2   Analysis of differences

Differences

Information about the difference between the means of X and Y is contained in the values D = (Y - X) for each individual. The hypotheses

H0 :   μX = μY
HA :   μXμY

can then be expressed as

H0 :   μD = 0
HA :   μD ≠ 0

This reduces the paired data set to a univariate data set of differences, D, and reduces questions about (µY - µX) to questions about the mean of D.

Analysis of paired data

By taking differences between Y and X, much of the variability between the individuals is eliminated, making it easier to see whether their means are different. The example below shows paired data on the left with blue lines joining the x- and y-values in each pair. The differences on the right make it clearer that the y-values are usually higher than the corresponding x-values.

11.5.3   Paired t-test

Approach (paired t-test)

Testing whether two paired measurements, X and Y, have equal means is done in terms of the differences

D = Y - X

The test is then expressed as

H0:   µD = 0

HA:   µD ≠ 0

or a one-tailed variant. The hypotheses are therefore assessed with a standard univariate t-test using test statistic

This is compared to a t distribution with n - 1 degrees of freedom to find the p-value.

Example

The diagram below illustrates a 2-tailed test for equal means, based on n = 15 paired observations.

From the p-value, we conclude that there is very strong evidence that the means for Y and X are different.

11.5.4   Pairing and experimental design

Choice between paired data or two independent samples

It is sometimes possible to answer questions about the difference between two means by collecting two alternative types of data.

Two independent samples
Measurements are made from two samples of individuals from the groups whose means are to be compared. A 2-sample t-test can be used to compare the means.
One paired sample
The 'individuals' can be re-defined as pairs of related values from the two groups and a single sample of these pairs can be collected. A paired t-test can be performed on the differences to compare the means.

If the individuals in the 2 groups can be paired so that the pairs are relatively similar, a paired design gives more accurate results.

Matched pairs in experiments

In experiments to compare two treatments, it may be possible to group the experimental units into pairs that are similar in some way. These are called matched pairs. If the two experimental units in each pair are randomly assigned to the two treatments, the data can be analysed as described in this section.

The difference between the treatments is estimated more accurately than in a completely randomised experiment.

11.6   Comparing several means

11.6.1   Model

Data

In this section, we examine data that may arise as:

We will model the data in terms of g groups. The data often arise from completely randomised experiments with g treatments.

Model

The model that was used for 2 groups can be easily extended to to g > 2 groups, allowing different means and standard deviations in all groups.

Group i:   Y   ~   normal, σi)

However to develop a test for equal group means with g > 2 groups, we must make an extra assumption that the standard deviations in all groups are the same.

Group i:   Y   ~   normal, σ)

If there are g groups, this model has g + 1 unknown parameters — the g group means and the common standard deviation, σ. It is flexible enough to be useful for many data sets.

If the assumptions of a normal distribution and constant variance do not hold, a nonlinear transformation of the response may result in data for which the model is appropriate.

11.6.2   Parameter estimates

Estimating the group means

We now assume a normal model with the same standard deviation in each group,

Group i:   Y   ~   normal, σ)

The sample means provide estimates of the {µi}:

Estimating σ2

The sample standard deviation in any single group, si, is a valid estimate of σ, but we need to combine these g separate estimates in some way.

It is easier to describe estimation of σ2 rather than σ. If the sample sizes are the same in all groups, a pooled estimate of σ2 is the average of the group variances,

If the sample sizes are not equal in all groups, this is generalised by adding the numerators and denominators of the formulae for the g separate group variances,

More mathematically, yij denotes the j 'th of the ni values in group i , for i  = 1 to g . The pooled estimate of σ2 can then be written as

The pooled variance is influenced most by the sample variances in the groups with biggest sample sizes.

11.6.3   Revisiting two groups ((optional))

Revisiting the difference between two group means

In an earlier section, we described confidence intervals and tests about the difference between two group means, µ- µ1. They can be improved if we can assume that

σ1 = σ2 = σ

Inference is still based on , but the equation for its standard deviation can be simplified

Confidence interval

A 95% confidence interval for µ- µ1 has the same general form as before,

but the standard deviation and the degrees of freedom for the t-value, ν, are different.

  degrees of freedom
Allowing σ1 ≠ σ2 min( n1 - 1, n2 - 1)
Assuming σ1 = σ2 = σ n1 + n2 - 2

If it can be assumed that σ1 = σ2, the confidence interval is usually narrower.

Example

The diagram below shows 95% confidence intervals obtained by the two methods.

The p-value for this test is found from the tail area of the t distribution with (n1 + n2 - 2) degrees of freedom.

11.6.4   Variation between and within groups

Comparing several groups

A new approach is needed to compare the means of three or more groups — the methods for two groups cannot be extended. We again assume a normal model with equal standard deviations,

Group i:   Y   ~   normal, σ)

Testing whether there are differences between the groups involves the hypotheses,

H0 :   µi  =  µj        for all i and j
HA:   µi  ≠  µj        for at least some i, j

Variation between and within groups

Testing whether the model means, {µi}, are equal is done by assessing the variation between the group means in the data. However, because of randomness in sample data, the means are unlikely be the same, even if H0 is true.

In the example on the left below, the group means vary so much that the {µi} are almost certainly not equal. However the group means on the right are relatively similar and their differences may simply be randomness.

To assess whether the means are 'unusually different', we must also take account of the variation within the groups. The data set on the left below gives much stronger evidence of group differences than that on the right, even though the group means are the same in both data sets.

The evidence against H0 depends on the relative size of the variation within groups and between groups.

11.6.5   Sums of squares

Notation

In the formulae in this page, the values in the i'th group are denoted by yi 1, yi 2, ... . More generally, the j'th value in the i'th group is called yij and the mean of the values in the i'th group is .

Total variation

The total sum of squares reflects the total variability of the response.

The overall variance of all values (ignoring groups) is the total sum of squares divided by (n - 1).

The sum of squares between groups measures the variability of the group means.

Variation between groups is summarised by the differences between the group means and the overall mean. Note that the summation is over all observations in the data set.

The sum of squares within groups quantifies the spread of values within each group.

This is also called the residual sum of squares since it describes variability that is unexplained by differences between the groups. Note that the pooled estimate of the common variance, σ2, is the sum of squares within groups divided by (n - g).

11.6.6   Coefficient of determination

Sums of squares

Sum of squares Interpretation
Overall variability of Y, taking no account of the groups.
Variability that cannot be explained by the model.
Variability that is explained by the model.

Coefficient of determination

The proportion of the total sum of squares that is explained by the model is called the coefficient of determination,

Example

11.6.7   Test for differences between groups

Hypothesis test

The following hypotheses are used to test whether the group means are all equal:

H0 :   µi  =  µj        for all i and j
HA:   µi  ≠  µj        for at least some i, j

We will describe some of the steps for this test, but cannot justify them here.

Mean sums of squares

The three sums of squares are first divided by values called their degrees of freedom:

The mean total sum of squares is the sample variance of the response (ignoring groups).
The mean within-group sum of squares is the pooled estimate of the variance within groups.
The mean between-group sum of squares is harder to directly interpret.

The numerators in these ratios add up:

SSTotal  =  SSBetween  +  SSWithin

and the same relationship holds for their denominators (degrees of freedom):

dfTotal  =  dfBetween  +  dfWithin

F ratio and p-value

The test statistic is an F-ratio,

This test statistic compares between- and within-group variation. The further apart the group means, the larger SSBetween and the larger the F-ratio.

Large values of F suggest that H0 does not hold — that the group means are not the same.

The p-value for the test is the probability of such a high F ratio if H0 is true (all group means are the same). It is based on a standard distribution called an F distribution and is interpreted in the same way as other p-values.

The closer the p-value to zero, the stronger the evidence that H0 does not hold.

Analysis of variance table

An analysis of variance table (anova table) describes some of the calculations above:

11.6.8   Examples

11.7   Randomised blocks

11.7.1   Generalising the idea of paired data

In paired data, two related measurements, X and Y, are made from each sampled individual and we are interested in testing whether their means are equal.

Groups of 3 or more values

The idea of paired data can be extended to situations in which 3 or more related measurements are made from each 'individual'. Two important situations that give rise to this type of data are:

Experiment with blocks
Paired data can arise when the experimental units are grouped into blocks of size 2 (e.g. matched pairs) and two treatments are used. This can be extended to g treatments with blocks of g experimental units and the treatments randomised within each block.
Repeated measure data
Several comparable measurements may be made from each individual, often measurements of the same quantity at different times.

Example (randomised blocks)

In an experiment to assess the effect of codeine and acupuncture for relieving dental pain, 32 subjects were grouped into blocks of 4 according to an initial assessment of their tolerance to pain. Four treatments were randomly given to the four subjects in each block and pain relief scores were recorded.

  Pain relief score
Tolerance
group
  Control   Codeine
only
Acupuncture
only
Codeine +
Acupuncture
1
2
3
4
5
6
7
8
0.0
0.3
0.4
0.4
0.6
0.9
1.0
1.2
0.6
0.7
0.8
0.9
1.5
1.6
1.7
1.6
0.5
0.6
0.8
0.7
1.0
1.4
1.8
1.7
1.2
1.3
1.6
1.5
1.9
2.3
2.1
2.4

Example (repeated measures)

An experiment investigated the use of nicotine to control tics in patients with Tourette's syndrome. For each patient, the number of tics was recorded before a nicotine gum was chewed and at different times afterwards.

  Number of tics during 30-min period
Patient   Baseline     Chewing gum     0-30 min after     30-60 min after  
1
2
3
4
5
6
7
8
9
10
249
1095
83
569
368
326
324
95
413
332
108
593
27
363
141
134
126
41
365
293
93
600
32
342
167
144
312
63
282
525
59
861
61
312
180
158
260
71
321
455

11.7.2   Example with baseline treatment

We start with a simple example in which one of the g treatments is a standard or 'baseline' treatment. The other (g − 1) treatments can be compared to it using standard confidence intervals for paired data. These confidence intervals are usually narrower than the corresponding confidence intervals that would be found for independent samples.

Example

In a randomised experiment about pain relief treatments in dental patients, 32 subjects were grouped into blocks of four according to an initial assessment of their tolerance to pain. One treatment was a placebo (dummy treatment) that the others could be compared to.

If the initial grouping of paitents into blocks is ignored, 95% confidence intervals for the improvement in pain relief over the placebo are wide. Taking account of the initial grouping, differences are far more accurately estimated.

11.7.3   Use of blocking information

Testing for equal treatment means

If there is no baseline treatment, analysis should start with a single hypothesis test for whether all treatment means are equal. The standard multi-group analysis of variance test for equal means in a completely randomised experiment (ignoring the blocks) should not be used for experiments with blocks.

Ignoring the existence of blocks makes it much harder to detect differences between treatments.

Example

Five different observers each watched the same group of 10 cattle and reported how long each animal spent grazing.

Wrong analysis

Ignoring the fact that the same animals were observed by all five observers, the data would be analysed with the anova table below. From the large p-value, we would conclude that there were no differences between the observers.

Correct analysis

Much of the variability in the data is due to differences between the animals (blocks), and an analysis that ignores this is much less sensitive to differences between the observers. We will not explain the correct test for blocked data until later in this section, but it gives a p-value that is interpreted in the same way as the p-value above. It is shown below and shows that there are almost certainly differences between the observers.

11.7.4   Randomised block designs

In paired data, each of the two treatments is used once within each block (pair). The previous pages generalised this to more than two treatments, but each treatment was still used once in each block. We now generalise further to allow the block size to be any multiple of the number of treatments.

Reducing unexplained variability

To assess the significance of differences between experimental treatments, variation in the treatment means is compared to the amount of unexplained (random) variation. With less unexplained variation, there is less chance of the differences between treatment means having arisen by chance. There are two ways to reduce unexplained variation:

Use experimental units that are as similar as possible.
Identical experimental units cannot usually be found but similar experimental units give best accuracy.
Group the experimental units into blocks of similar units.
If the experimental units vary but can be grouped into blocks of similar units, then these blocks 'explain' some of the variability, and unexplained variability is reduced.

The simplest way to use blocks in an experiment is with a randomised block design. In this, the block size is a multiple of the number of treatments. Each treatment is used for the same number of experimental units within each block, and the treatments are randomly allocated to units within the blocks.

Example

An experiment was conducted in which the experimental units were intestinal preparations from fish, but each fish would only give six preparations. The six preparations from each fish constitute a block of units. Two treatments were used, with the six preparations from each of four fish randomly split into three preparations for each treatment (a randomised block experiment).

Wrong analysis

Ignoring possible differences between the four fish and treating the data as a completely randomised experiment with 24 experimental units, we would conclude that there is moderately strong evidence of a difference between the two treatments.

Correct analysis

There are considerable differences between the four fish (blocks), with much lower variability within any single fish. The correct analysis is explained later in this section and the resulting p-value gives much stronger evidence of a difference between the two treatments.

11.7.5   Model for randomised blocks ((optional))

Three-dimensional scatterplot of data

Data from a randomised block experiment can be displayed in a three-dimensional scatterplot:

Model

Both blocks and treatments explain some variability in the response measurement, Y, but...

Blocks and treatments are modelled in the same way:

y = (overall mean) + (effect depending on block)
         + (effect depending on treatment) + error

The error is again assumed to have a normal distribution with mean zero and constant standard deviation. Within any block, changing the treatment simply adds or subtracts a constant to the response.

11.7.6   Removing block effects

Making all block means equal

Our model for randomised block data explains the effect of the blocks on Y as a addition of a "block effect" to all values within each block. This suggests eliminating differences between the blocks by adjusting the values in all blocks to have the same block means.

Example

The diagram below shows results from an experiment with blocks of size five and five treatments. Different colours are used for the different blocks and the block means are shown with vertical lines.

A lot of the variability in the response, Y, is caused by differences between the blocks. The diagram below adjusts the values by adding a constant to all values in each block, givving all blocks the same mean response.

Since there is now much less 'unexplained' variation and there are now no differences between blocks, applying the standard anova test for equal treatment means to the adjusted data seems reasonable and is much more sensitive to treatment differences:

The treatment and residual sums of squares shown here are the basis for testing whether the treatment means are equal, but the analysis is not completely correct.

The residual degrees of freedom are too high.

The correct analysis of variance table for testing equal treatment means is a little more complex for randomised block data; it will be explained in the following pages.

11.7.7   Sums of squares

Notation

For randomised block data, we use the following notation:

ybgr the rth of the observations in block b that get treatment g
b mean response in block b
g mean response for all observations getting treatment g
overall mean response for all observations

In many examples, there is only a single observation for each combination of block and treatment, but our notation allows for two or more.

Sums of squares

For randomised block data, we again split the total sum of squares into components, but now need to use three components.

The block and treatment sums of squares describe variation that is explained by the randomised block model whereas the residual sum of squares is unexplained.

The total sum of squares reflects the total variability of the response.
The sum of squares between blocks measures the variability of the block means.
The sum of squares between treatments measures the variability of the treatment means.
The residual sum of squares describes the variation that is unexplained by blocks or treatments.

Note that all summations are over all observations in the data set.

Residuals and residual variation

As in regression, we define residuals to be the difference between the recorded response values and the closest we can get from our model. For a randomised block model, the best estimate is:

 =  b0 + b1 xi

This can be interpreted as:

11.7.8   Anova table and examples

Anova table

The three components that add to total sum of squares are usually laid out in an analysis of variance table (or simply anova table).

The anova table adds a few extra columns:

Degrees of freedom
They also add up to the value in the Total row, (n − 1).
Mean sums of squares
These are the sums of squares divided by their degrees of freedom.
F-ratios
For the blocks and treatments, the F-ratio divides the mean sum of squares by the mean residual sum of squares.

Tests

The F-ratio for differences between the treatments compares the variability explained by the treatments to the residual (unexplained) variation. The larger the F-ratio, the stronger the evidence for a difference between treatments. A formal hypothesis test is based on the F-ratio and its p-value is the probability of getting as big an F-ratio as that recorded if all treatment means were equal. It is interpreted in the same way as all other p-values.

p-value > 0.1
No evidence of a difference between treatments
0.1 < p-value < 0.05
Very mild evidence of a difference between treatments
0.05 < p-value < 0.01
Moderately strong evidence of a difference between treatments
p-value < 0.01
Strong evidence of a difference between treatments

A p-value can also be found to test whether there are differences between the blocks, but this is usually of less interest.

In practice, computer software will produce the anova table for you, so you only need to interpret the p-value associated with the treatments.

Examples