If you don't want to print now,

Chapter 11   Comparing Groups

11.1   Models for two groups

11.1.1   Interest in underlying population

Data from two groups

When data are collected from two groups, we are usually interested in differences between the groups in general. The specific individuals are of less interest. Questions are therefore about the characteristics of the populations or processes that we assume underlie the data.

Example

The questions do not refer to the 16 specific subjects — they ask about whether anticipation of hypnosis affects the ventilation rate in general. We would like to use the answers to predict what will happen to other people.

11.1.2   Model for two groups

Data and model

Data from two groups can be displayed with two histograms:

The diagram below illustrates a possible model for the data above.

11.1.3   Parameters of the normal model

Parameters

A normal model for two groups has four unknown parameters (the mean and standard deviation for each normal distribution). These parameters give considerable flexibility and allow the model to be used for a variety of different data sets.

(The number of parameters can be reduced to three if it is assumed that the two standard deviations are the same, but we will not consider this type of model here.)

11.1.4   Parameter estimates

Parameter estimates

A normal model for 2-group data involves 4 unknown parameters, µ1, µ2, σ1 and σ2. The means and standard deviations in the two samples provide objective estimates of the four parameters.

11.1.5   Difference between means

Comparing the populations

Although standard deviations in the two populations may also differ, we are usually most interested in the difference between the population means. Differences between the means can be expressed in terms of the model parameters with the following questions.

Randomness of sample difference

These questions are about µ2 - µ1 and the best estimate of it is . However, cannot give definitive answers since it is random — it varies from sample to sample.

Without an understanding of the distribution of , it is impossible to properly interpret what the sample difference, 9.5 pieces, tells you about the difference between the underlying population means.

11.2   Distn of sums and differences

11.2.1   Means and sums of samples

Sample mean and sum

The mean of a random sample, , has a distribution that is approximately normal if the sample size, n, is large and alway has a mean and standard deviation that depend on the population mean, µ, and standard deviation, σ,

 =  μ
 = 

Occasionally the sum of values in a random sample values is more useful than the mean,

Its distribution is a scaled version of the distribution of the mean — the same shape but different mean and standard deviation.

Mean vs Sum

As the sample size increases,

11.2.2   Sum and difference

Sum and difference of two variables

Applying the result about the sum of a random sample to a sample of size n = 2, X1 and X2,

If we generalise by allowing X1 and X2 to have different means, µ1 and µ2, but the same σ,

A similar result holds for the difference between X1 and X2:

If X1 and X2 are independent and have normal distributions, their sum and difference are also normally distributed.

11.2.3   Sum and difference (cont)

General result

The results generalise further to independent variables that may have different means and standard deviations.

The formulae for the standard deviations are more easily remembered in terms of the variances of the quantities. For example,

11.2.4   Probabilities for sums and differences

Finding probabilities

To find the probability that a sum or difference satisfies an inequality, the inequality should be translated into ones about a z-score, using the mean and standard deviation of the quantity,

The standard normal distribution can then be used to find the probabilities. The examples below illustrate the method.

Example (total of several variables)

Example (sum of two variables with different sd)

11.3   Comparing means in two groups

11.3.1   Distn of difference between means

Difference between means

The difference between any two independent quantities X1 and X2 has a distribution with

Applying this to the difference between the means of two random samples,

If the distributions are normal in each group, ...
... the sample means are normal, so their difference also has a normal distribution.
Otherwise, ...
... the two sample means are approximately normal if the sample sizes are large, so their difference is also close to normal.

Irrespective of the distributions within the two groups,

11.3.2   SE of difference between means

Estimation error

The difference between the sample means, , is a point estimate of the difference between the means of the underlying populations, µ2 - µ1. In order to properly interpret it, we must understand the distribution of the estimation error.

  is a point estimate of µ2 - µ1

Replacing σ12 and σ22 by s12 and s22 gives an approximate error distribution,

The standard deviation of these errors is the standard error of the estimator.

Examples

Our best estimate is that healthy companies have a mean assets-to-liabilities ratio that is 0.902 higher than that of failed companies. From the error distribution, the error in this estimate is unlikely to be more than about 0.3.

11.3.3   CI for difference between means

If σ1 and σ2 were known...

Prob (  is within   ±  1.96     of   μ2 - μ1)   =   0.95

so a 95% confidence interval for µ2 - µ1 would be

  ±   1.96  

When σ1 and σ2 are unknown...

We must replace σ1 and σ2 by s1 and s2 in the confidence interval, and the constant '1.96' must be replaced by a slightly larger value from t-tables,

where the degrees of freedom for the t-value are

ν   =   min (n1−1,  n2−1)

(A more complex formula is available that gives a higher value for ν. It is slightly better but the difference is usually negligible.)

Example

11.3.4   Testing a hypothesis

Testing for a difference between two means

The difference between two groups that is of most practical importance is a difference between their means.

H0 :   μ2μ1  =  0
HA :   μ2μ1  ≠  0

The summary statistic that throws most light on these hypotheses is the difference between the sample means, . Testing therefore involves assessment of whether this difference is unusually far from zero.

As with all other hypothesis tests, a p-value near zero gives evidence that the null hypothesis does not hold — evidence of a difference between the group means.

Example

General properties of p-values

A statistical hypothesis test cannot provide a definitive answer about whether two groups have different means. The randomness of sample data means that p-values are also random quantities.

It is possible to get a small p-value (supporting HA) when H0 is true, and it is possible to get a large p-value (consistent with H0) when HA is true.

There is some chance of being misled by an 'unlucky sample.

If H0 is true
All p-values between 0 and 1 are equally likely. For example, there is a 5% probability of getting a p-value less than 0.05.
If HA is true
The p-value is more likely to be near zero, though there is still some chance of a larger p-value.

Effect of increasing the sample size

If H0 is true
The p-values remain equally likely between 0 and 1.
If HA is true
The distribution of p-values becomes more concentrated near zero, so you are more likely to conclude that the population means are really different.

11.3.5   One-tailed tests for differences

One- and two-tailed tests for differences

In a two-tailed test, the alternative hypothesis is that the two population means are different. A one-tailed test arises when we want to test whether one mean is higher than the other (or lower than the other).

Test statistic, p-value and conclusion

Consider a test for the hypotheses,

H0 :   μ1  =  μ2
HA :   μ1  >  μ2

The alternative hypothesis is only supported by very small values of . This also corresponds to small values of the test statistic t , so the p-value is the lower tail probability of the t distribution.

A small p-value is interpreted as giving evidence that H0 is false, in a similar way to all other kinds of hypothesis test.

Examples

Properties of p-values

We again stress that a statistical hypothesis test cannot provide a definitive answer. The randomness of sample data means that p-values are also random quantities, so there is some chance of us being misled by an 'unlucky' sample:

11.4   Comparing two proportions

11.4.1   Modelling two proportions

Two groups of successes and failures

We now consider data that are obtained as random samples from two populations, with the sampled individuals being categorised into successes and failures.

Since our model involves only two parameters, π1 and π2, the two groups are the same only if π2 - π1 = 0. The value of π2 - π1 is usually unknown but can be estimated by p2 - p1. However p2 - p1 is a random quantity so its variability must be taken into account when interpreting its value.

Example

Note that the questions do not refer to the specific 100 managers in the study. They ask about differences between male and female managers 'in general'.

We are interested in π2 - π1 rather than p2 - p1, so we need to understand the accuracy of our point estimate.

11.4.2   Distribution of difference in proportions

Difference between two proportions

Within each group, the sample proportion of successes, p, has a distribution that is approximately normal in large samples and has mean and standard deviation

Applying the general results about the difference between two independent random quantities:

Since the individual proportions are approximately normal (in large samples), their difference is also approximately normal:

11.4.3   CI for difference in proportions

Standard error of p2 - p1

The standard deviation of p2 - p1 is also its standard error when it is used to estimate π2 - π1,

In practice, π1 and π2 must be replaced by their sample equivalents to estimate the standard error.

Confidence interval for difference

Most 95% confidence intervals are of the form

estimate   ±   1.96 × se(estimate)

perhaps with a refinement of using a slightly higher value than 1.96 (e.g. a t-value) if the standard error is estimated. Applying this to our estimate of π2 - π1and using 2 instead of 1.96 gives the approximate 95% confidence interval

Example

11.4.4   Testing for difference in probabilities

Two-tailed test

H0 :   π1  =  π2
HA :   π1  ≠  π2

For this test, the steps involved in obtaining a p-value are:

The p-value is interpreted in the same way as for all previous tests. A p-value close to zero is unlikely when H0 is true, but is more likely when HA holds. Small p-values therefore provide evidence of a difference between the population probabilities.

One-tailed test

In a 1-tailed test, the alternative hypothesis is

HA :   π1  −  π2  >  0    or    HA :   π1  −  π2  <  0

The test statistic is identical to that for a 2-tailed test and the p-value is obtained in a similar way, but it is found from only a single tail of the standard normal distribution.

Alternative test statistic

Since π1 and π2 are equal if H0 is true, the overall proportion of successes, p, can be used in the formula for the standard error of p2 - p1.

This refinement makes little difference in practice, so the examples below use the 'simpler' formula that we gave earlier.

Two-tailed example

One-tailed example

11.5   Paired t test

11.5.1   Paired data

Paired data

When two types measurements, X and Y, are made from each individual (or other unit), the data are called bivariate. Sometimes the two measurements are of closely related quantities and may even describe the same quantity at different times.

When the sum or difference of X and Y is a meaningful quantity, the data are called paired data.

Hypotheses of interest

For paired data, We often want to test whether the means of the two variables are equal,

H0 :   μX = μY
HA :   μXμY

Sometimes a one-tailed test is required, such as

H0 :   μX = μY
HA :   μX > μY

Examples

Pre-test, post-test data
This arises when a measurement is made from each individual, then a second measurement of the same type is made after some kind of intervention (e.g. training or medication). Has the intervention "improved" the measurement?
Twin studies
Some experiments or other studies are conducted with identical twins, either human or animal. The members of each pair experience different environments — either two different experimental treatments or two other differences. Are there differences between the two treatments?
Other types of pairing
For example, damaged cars may each be taken to two garages for estimates of the cost of repair. The two estimates for each car are paired data. Does one garage overcharge?

11.5.2   Analysis of differences

Differences

Information about the difference between the means of X and Y is contained in the values D = (Y - X) for each individual. The hypotheses

H0 :   μX = μY
HA :   μXμY

can then be expressed as

H0 :   μD = 0
HA :   μD ≠ 0

This reduces the paired data set to a univariate data set of differences, D, and reduces questions about (µY - µX) to questions about the mean of D.

Analysis of paired data

By taking differences between Y and X, much of the variability between the individuals is eliminated, making it easier to see whether their means are different. The example below shows paired data on the left with blue lines joining the x- and y-values in each pair. The differences on the right make it clearer that the y-values are usually higher than the corresponding x-values.

11.5.3   Paired t-test

Approach (paired t-test)

Testing whether two paired measurements, X and Y, have equal means is done in terms of the differences

D = Y - X

The test is then expressed as

H0:   µD = 0

HA:   µD ≠ 0

or a one-tailed variant. The hypotheses are therefore assessed with a standard univariate t-test using test statistic

This is compared to a t distribution with n - 1 degrees of freedom to find the p-value.

Example

The diagram below illustrates a 2-tailed test for equal means, based on n = 15 paired observations.

From the p-value, we conclude that there is very strong evidence that the means for Y and X are different.

11.5.4   Pairing and experimental design

Choice between paired data or two independent samples

It is sometimes possible to answer questions about the difference between two means by collecting two alternative types of data.

Two independent samples
Measurements are made from two samples of individuals from the groups whose means are to be compared. A 2-sample t-test can be used to compare the means.
One paired sample
The 'individuals' can be re-defined as pairs of related values from the two groups and a single sample of these pairs can be collected. A paired t-test can be performed on the differences to compare the means.

If the individuals in the 2 groups can be paired so that the pairs are relatively similar, a paired design gives more accurate results.

Matched pairs in experiments

In experiments to compare two treatments, it may be possible to group the experimental units into pairs that are similar in some way. These are called matched pairs. If the two experimental units in each pair are randomly assigned to the two treatments, the data can be analysed as described in this section.

The difference between the treatments is estimated more accurately than in a completely randomised experiment.

11.6   Comparing several means

11.6.1   Model

Data

In this section, we examine data that may arise as:

We will model the data in terms of g groups. The data often arise from completely randomised experiments with g treatments.

Model

The model that was used for 2 groups can be easily extended to to g > 2 groups, allowing different means and standard deviations in all groups.

Group i:   Y   ~   normal, σi)

However to develop a test for equal group means with g > 2 groups, we must make an extra assumption that the standard deviations in all groups are the same.

Group i:   Y   ~   normal, σ)

If there are g groups, this model has g + 1 unknown parameters — the g group means and the common standard deviation, σ. It is flexible enough to be useful for many data sets.

If the assumptions of a normal distribution and constant variance do not hold, a nonlinear transformation of the response may result in data for which the model is appropriate.

11.6.2   Parameter estimates

Estimating the group means

We now assume a normal model with the same standard deviation in each group,

Group i:   Y   ~   normal, σ)

The sample means provide estimates of the {µi}:

Estimating σ2

The sample standard deviation in any single group, si, is a valid estimate of σ, but we need to combine these g separate estimates in some way.

It is easier to describe estimation of σ2 rather than σ. If the sample sizes are the same in all groups, a pooled estimate of σ2 is the average of the group variances,

If the sample sizes are not equal in all groups, this is generalised by adding the numerators and denominators of the formulae for the g separate group variances,

More mathematically, yij denotes the j 'th of the ni values in group i , for i  = 1 to g . The pooled estimate of σ2 can then be written as

The pooled variance is influenced most by the sample variances in the groups with biggest sample sizes.

11.6.3   Revisiting two groups ((optional))

Revisiting the difference between two group means

In an earlier section, we described confidence intervals and tests about the difference between two group means, µ- µ1. They can be improved if we can assume that

σ1 = σ2 = σ

Inference is still based on , but the equation for its standard deviation can be simplified

Confidence interval

A 95% confidence interval for µ- µ1 has the same general form as before,

but the standard deviation and the degrees of freedom for the t-value, ν, are different.

  degrees of freedom
Allowing σ1 ≠ σ2 min( n1 - 1, n2 - 1)
Assuming σ1 = σ2 = σ n1 + n2 - 2

If it can be assumed that σ1 = σ2, the confidence interval is usually narrower.

Example

The diagram below shows 95% confidence intervals obtained by the two methods.

The p-value for this test is found from the tail area of the t distribution with (n1 + n2 - 2) degrees of freedom.

11.6.4   Variation between and within groups

Comparing several groups

A new approach is needed to compare the means of three or more groups — the methods for two groups cannot be extended. We again assume a normal model with equal standard deviations,

Group i:   Y   ~   normal, σ)

Testing whether there are differences between the groups involves the hypotheses,

H0 :   µi  =  µj        for all i and j
HA:   µi  ≠  µj        for at least some i, j

Variation between and within groups

Testing whether the model means, {µi}, are equal is done by assessing the variation between the group means in the data. However, because of randomness in sample data, the means are unlikely be the same, even if H0 is true.

In the example on the left below, the group means vary so much that the {µi} are almost certainly not equal. However the group means on the right are relatively similar and their differences may simply be randomness.

To assess whether the means are 'unusually different', we must also take account of the variation within the groups. The data set on the left below gives much stronger evidence of group differences than that on the right, even though the group means are the same in both data sets.

The evidence against H0 depends on the relative size of the variation within groups and between groups.

11.6.5   Sums of squares

Notation

In the formulae in this page, the values in the i'th group are denoted by yi 1, yi 2, ... . More generally, the j'th value in the i'th group is called yij and the mean of the values in the i'th group is .

Total variation

The total sum of squares reflects the total variability of the response.

The overall variance of all values (ignoring groups) is the total sum of squares divided by (n - 1).

The sum of squares between groups measures the variability of the group means.

Variation between groups is summarised by the differences between the group means and the overall mean. Note that the summation is over all observations in the data set.

The sum of squares within groups quantifies the spread of values within each group.

This is also called the residual sum of squares since it describes variability that is unexplained by differences between the groups. Note that the pooled estimate of the common variance, σ2, is the sum of squares within groups divided by (n - g).

11.6.6   Coefficient of determination

Sums of squares

Sum of squares Interpretation
Overall variability of Y, taking no account of the groups.
Variability that cannot be explained by the model.
Variability that is explained by the model.

Coefficient of determination

The proportion of the total sum of squares that is explained by the model is called the coefficient of determination,

Example

11.6.7   Test for differences between groups

Hypothesis test

The following hypotheses are used to test whether the group means are all equal:

H0 :   µi  =  µj        for all i and j
HA:   µi  ≠  µj        for at least some i, j

We will describe some of the steps for this test, but cannot justify them here.

Mean sums of squares

The three sums of squares are first divided by values called their degrees of freedom:

The mean total sum of squares is the sample variance of the response (ignoring groups).
The mean within-group sum of squares is the pooled estimate of the variance within groups.
The mean between-group sum of squares is harder to directly interpret.

The numerators in these ratios add up:

SSTotal  =  SSBetween  +  SSWithin

and the same relationship holds for their denominators (degrees of freedom):

dfTotal  =  dfBetween  +  dfWithin

F ratio and p-value

The test statistic is an F-ratio,

This test statistic compares between- and within-group variation. The further apart the group means, the larger SSBetween and the larger the F-ratio.

Large values of F suggest that H0 does not hold — that the group means are not the same.

The p-value for the test is the probability of such a high F ratio if H0 is true (all group means are the same). It is based on a standard distribution called an F distribution and is interpreted in the same way as other p-values.

The closer the p-value to zero, the stronger the evidence that H0 does not hold.

Analysis of variance table

An analysis of variance table (anova table) describes some of the calculations above:

11.6.8   Examples