Long page
descriptions

Chapter 13   Independence

13.1   Probability and applications

13.1.1   Joint probabilities

Bivariate categorical data are modelled as a sample from a population that consists of pairs of categorical values. The joint probability for any pair of categories is their population proportion.

13.1.2   Marginal probabilities

The marginal probabilities for a variable are the population proportions for its possible values. They can be found by summing joint probabilities.

13.1.3   Conditional probabilities

Conditional probabilites for a variable are proportions in a sub-population containing a specific value for the other variable. They are found by scaling the joint probabilities in that sub-population.

13.1.4   Graphical display of probabilities

Joint, marginal and conditional probabilities can be displayed graphically.

13.1.5   Calculations with probabilities

The model can be equivalently described by (a) joint probabilities, (b) marginal probabilites for X and conditional probabilities for Y, or (c) marginal probabilites for Y and conditional probabilities for X. Any of these sets of probabilities can be found any other set.

13.2   Independence

13.2.1   Association

Two categorical variables, X and Y, are associated (related) when the conditional distribution of Y given X=x is different for different values of x. Knowing the value of X therefore tells you something about Y.

13.2.2   Independence

When the conditional distribution of Y is the same for all values of X, the variables are called independent. This special case is of practical importance.

13.2.3   Independence from samples

Independence is a population property. To assess independence from a sample contingency table, the observed cell counts are compared to those estimated from a model with independence.

13.2.4   Testing for independence

The raw sum of squared differences between observed and estimated cell counts is not a good test statistic.

13.2.5   Chi-squared test statistic

The 'chi-squared' statistic is a modified sum of squared differences that has a standard distribution (a chi-squared distribution) when there is independence.

13.2.6   P-value for chi-squared test

The chi-squared statistic can be used to find a p-value for testing independence. The p-value has similar interpretation and properties to p-values for all other hypothesis tests.

13.2.7   Examples

The chi-squared test is applied to a few real data sets. When the variables are found to be associated, the nature of the relationship is described from a comparison of observed and estimated cell counts.

13.2.8   Comparing groups

The chi-squared test assesses independence of two categorical variables. It is also used to test whether a single categorical variable has the same distribution in several groups.