Contingency tables from univariate data in several groups

Contingency tables often arise from bivariate categorical data. However they can also arise from univariate categorical data that is recorded separately from several groups.

'Group membership' can be treated as a second categorical variable.

Typhoid vaccines

The table below shows results from a trial of two vaccines for typhoid fever that were given to 6907 people in Nepal. Of these, 3,457 were given the vaccine Vi capsular polysaccharide (Vi) and 3,450 were given the vaccine pneumococcus polysaccharide (Pneumo).

  Vaccine Vi Vaccine Pneumo Total
Typhoid fever 14 57 71
No typhoid 3443 3393 6836
Total 3457 3450 6907

In this example, two different groups of people were used in the experiment. The column variable (distinguishing between the two vaccines) is not a random variable, as it is controlled by the experiment. A single categorical measurement (whether or not typhoid was contracted) was made from each person.

Comparing groups

Although the chi-squared test was motivated as a test of independence of two categorical variables, the same test can be used when each row (or column) of a contingency table corresponds to a separate group of individuals.

Null hypothesis (corresponding to independence)
The category probabilities are the same within each group.
Alternative hypothesis (corresponding to association)
The different groups have different probabilities.

The χ2 test statistic and p-value are identical to those given earlier for testing independence.

Examples

In the following examples, we test whether the 'response' proportions are the same in several groups.

Note again that a visual comparison of the observed counts and those estimated from the margins assuming independence helps to explain the nature of the relationship in examples where we conclude that there is some difference between the groups.

Two groups and two categories

In the special case where there are two groups and the categorical measurement has two categories (that we will call 'success' and 'failure'), the chi-squared test is testing whether the probability of success is the same in both groups. For example, in the Typhoid Vaccine data set, we are testing whether the probability of getting typhoid is the same for the groups getting the two vaccines.

This hypothesis can also be tested with a 2-sample test of equality of two proportions.

Fortunately, although the two tests have been motivated in a different way, it can be proved that:

The 2-sample test for equality of two proportions and the chi-squared test both result in the same p-value and conclusion.