Assessing independence from a sample
Independence is an important concept, but it is defined in terms of the joint population probabilites and in most practical situations these are unknown. We must assess independence from a sample of individuals — a contingency table.
Example
The contingency table below categorises a sample of 214 individuals by gender and some other characteristic (possibly weight group or grade in a test).
Male | Female | Total | |
---|---|---|---|
A | 20 | 60 | 80 |
B | 9 | 84 | 93 |
C | 2 | 39 | 41 |
Total | 31 | 183 | 214 |
Is this consistent with a model of independence of the characteristic and gender? (Are the probabilities of A, B and C grades the same for males and females?)
Estimated cell counts under independence
To assess independence, we first find the pattern of cell counts that is most consistent with independence in a contingency table with the observed marginal totals.
Male | Female | Total | |
---|---|---|---|
A | ? | ? | 80 |
B | ? | ? | 93 |
C | ? | ? | 41 |
Total | 31 | 183 | 214 |
The pattern that is most consistent with independence has the following estimated cell counts:
where n denotes the total for the whole table and nx and ny denote the marginal totals for row x and column y.
Applying this to our example gives the following table:
Male | Female | Total | |
---|---|---|---|
A | ![]() |
80 | |
B | 93 | ||
C | 41 | ||
Total | 31 | 183 | 214 |