Alternative parameterisations

There are several alternative ways to write a general linear model that all allow an arbitrary mean for each of g groups. We have already seen parameterisations with:

Many other parameterisations of the same basic model are possible.

We will describe a few alternative parameterisations in this page.

Different parameterisations can allow the explained sum of squares for the factor (with g - 1 degrees of freedom) to be split into components that can be used in an anova table to test meaningful hypotheses.

Parameterisation for testing whether some group means are equal

We first consider how to define indicator variables for testing whether some group means are equal.

  1. One or more indicator variables specify a model with the required group means equal.
  2. Additional indicator variables are defined to give the additional flexibility of allowing all group means to be different.

This can be explained most easily in an example.

Comparisons with a control group

Consider experimental data that includes a Control group and three other treatment groups, AAA, BBB and CCC. We might be interested in testing whether AAA, BBB and CCC have equal group means. We might define the indicator variables as follows:

  1. The first indicator variable contrasts the Control group with all others.
  2. Two further indicator variables allow for differences between the means of the three treatment groups.

If there were 3 response measurements (replicates) at each factor level, the X matrix would be as shown below:

Click any row to see how the indicator variables select parameters to form the response mean. The next diagram shows the meanings of the parameters graphically. (The response values were simply chosen to illustrate the model.)

Drag the red arrows and observe that the model still allows arbitrary means for all groups.

Testing equality of some group means

For the above example, if the last two parameters were zero — i.e. if the last two indicator variables were deleted from the model, the response means for AAA, BBB and CCC would be constrained to be equal. The sequential sum of squares for these two indicator variables therefore leads to a hypothesis test for whether these three response means are equal.

The following numerical example explains.


Pain tolerance and hair colour

Studies conducted at the University of Melbourne indicate that there may be a difference between the pain thresholds of blonds and brunettes. Men and women of various ages were divided into four categories according to hair colour: light blond, dark blond, light brunette, and dark brunette. The purpose of the experiment was to determine whether hair colour is related to the amount of pain produced by common types of mishaps and assorted types of trauma. Each person in the experiment was given a pain threshold score based on his or her performance in a pain sensitivity test (the higher the score, the higher the person's pain tolerance).

In the parameterisation below, the first indicator variable is a contrast between the blonds and brunettes. The other indicator variables distinguish (light and dark blonds) and (light and dark brunettes).

Analysis of variance table

The analysis of variance table below shows the sequential sums of squares, split into a component for the first indicator variable contrasting blonds and brunettes, then a component for the other two indicator variables comparing the sub-colours (light vs dark blonds and light vs dark brunettes).

From the p-value associated with the sequential sum of squares between sub-colours, we conclude that there is no evidence in the data of differences in pain threshold between light and dark blonds or between light and dark brunettes.

Since this p-value is not significant, we can continue to interpret the p-value above as giving strong evidence of a difference between blonds and brunettes.

Click the checkbox Combined ssq for hair colour to add together these sequential sums of squares. The resulting sum of squares (3 d.f.) is identical to the explained sum of squares that would be obtained from other parameterisations of the model.

Illustration of the sequential sums of squares

The explained sums of squares in an anova table (whether sequential or not) are always the sum of squares of differences between the fitted values from two models. The diagram below illustrates the explained sums of squares for the pain tolerance data.

Use the pop-up menu to see the differences that are summed for each of the explained sums of squares.

Between hair colours
Differences between group means and overall mean
Blond vs brunette
Differences between main hair colour mean and overall mean
Between sub-colours
Differences between sub-colour mean and main hair colour mean