Contingency tables from univariate data in several groups

Contingency tables often arise from bivariate categorical data. However they can also arise from univariate categorical data that is recorded separately from several groups.

'Group membership' can be treated as a second categorical variable.

Effect of false claims in adverts

In a study to assess how false or misleading adverts affect consumers, one group of 100 experimental subjects was exposed to a series of adverts falsely claiming that a new brand of coffee contained 'no bitterness'. These subjects and another control group of 100 people who had not seen the adverts were given a sample of coffee that had been prepared to be intentionally bitter. The contingency table below shows whether the subjects reported the coffee as 'having bitterness'.

  False advert No advert Total
Coffee described
as bitter
68 89 157
Coffee described as
not bitter
32 11 43
Total 100 100 200

In this example, two different groups of people were used in the experiment. The column variable (distinguishing between the two types of advert) is not a random variable, as it is controlled by the experiment. A single categorical measurement (whether or not the coffee sample was bitter) was made from each person.

Comparing groups

Although the chi-squared test was motivated as a test of independence of two categorical variables, the same test can be used when each row (or column) of a contingency table corresponds to a separate group of individuals.

Null hypothesis (corresponding to independence)
The category probabilities are the same within each group.
Alternative hypothesis (corresponding to association)
The different groups have different probabilities.

The χ2 test statistic and p-value are identical to those given earlier for testing independence.

Examples

In the following examples, we test whether the 'response' proportions are the same in several groups.

Note again that a visual comparison of the observed counts and those estimated from the margins assuming independence helps to explain the nature of the relationship in examples where we conclude that there is some difference between the groups.

Two groups and two categories

In the special case where there are two groups and the categorical measurement has two categories (that we will call 'success' and 'failure'), the chi-squared test is testing whether the probability of success is the same in both groups. For example, in the Bitter Coffee data set, we are testing whether the probability of reporting that the coffee was bitter is the same for the groups seeing the false adverts and those who did not.

This hypothesis can also be tested with a 2-sample test of equality of two proportions.

Fortunately, although the two tests have been motivated in a different way, it can be proved that:

The 2-sample test for equality of two proportions and the chi-squared test both result in the same p-value and conclusion.