Assessing independence from a sample

Independence is an important concept, but it is defined in terms of the joint population probabilites and in most practical situations these are unknown. We must assess independence from a sample of individuals — a contingency table.

Example

The contingency table below categorises a sample of 214 individuals by gender and some other characteristic (possibly weight group or grade in a test).

Sample Data
    Male   Female Total
A 20 60 80
B 9 84 93
C 2 39 41
Total 31 183 214

Is this consistent with a model of independence of the characteristic and gender? (Are the probabilities of A, B and C grades the same for males and females?)

Estimated cell counts under independence

To assess independence, we first find the pattern of cell counts that is most consistent with independence in a contingency table with the observed marginal totals.

    Male   Female Total
A ? ? 80
B ? ? 93
C ? ? 41
Total 31 183 214

The pattern that is most consistent with independence has the following estimated cell counts:

where n denotes the total for the whole table and nx and ny denote the marginal totals for row x and column y.

Applying this to our example gives the following table:

  Male Female Total
A 80
B 93
C 41
Total 31 183 214