If you don't want to print now,

Chapter 13   Independence

13.1   Probability and applications

13.1.1   Joint probabilities

Data sets with two categorical variables

Bivariate categorical data sets are usually summarised with a contingency table.

For example, a study examined 62 patients who had been given a prescription medicine for some condition. Each patient was classified by whether they had complied with the treatment prescribed and by racial group:

Race     Compliers     Non-compliers    Total   
  White 13 10 23
  Non-white    13 26 39
Total 26 36 62

Joint probabilities

Bivariate categorical data can be modelled as a random sample from an underlying population of pairs of categorical values. The population proportion for each pair (xy) is denoted by pxy and is called the joint probability for (xy).

In games of chance, we can often work out the joint probabilities. For example, if a gambler draws a card from a shuffled deck and also tosses a coin, there are eight possible combinations,

13.1.2   Marginal probabilities

Probabilities for a single variable

A model for two categorical variables is characterised by the joint probabilities pxy.

The marginal probability, px, for a variable X is the proportion of (xy) pairs in the population with X  = x . This can be found by adding all joint probabilities for pairs with this x-value.

There is a similar formula for the marginal probabilities of the other variable,

Example

In the following example, the marginal probabilities for X are the row of totals under the table, and the marginal probabilities for Y are the column of totals on the right.

Joint probabilities
  Variable X  
Variable Y X = A X = B X = C Total
Y = 1 0.2576 0.1364 0.1212 0.5152
Y = 2 0.0909 0.0758 0.0152 0.1818
Y = 3 0.0455 0.0758 0.0606 0.1818
Y = 4 0.0152 0.0303 0.0758 0.1212
Total 0.4091 0.3182 0.2727 1.0000

13.1.3   Conditional probabilities

Probabilities in a sub-population

Complete population
The joint probabilities pxy and the marginal probabilities px and py all describe proportions in the complete population of (xy) pairs.
Sub-population
In contrast, it is sometimes meaningful to restrict attention to a subset of the (xy) pairs. For example, we may be interested only in pairs for which the first variable, X , has some particular value. Probabilities that relate to a sub-population are called conditional probabilities.

Conditional probabilities for Y, given X = x

The general definition of the conditional probabilities for Y given that the value of X is x is

They can be found by rescaling of that row of the table of joint probabilities (dividing by px) so that the row sums to 1.0.

Two sets of conditional probabilities

Conditional probabilities for X given that Y  has the value y are defined in a similar way:

You should be careful to distinguish between px | y and py | x.

The probability of being pregnant, given that a randomly selected person is female would be fairly small. The probability of being female, given that a person is pregnant is 1.0 !!

13.1.4   Graphical display of probabilities

Proportional Venn diagrams

A proportional Venn diagram is drawn from the marginal probabilities of one variable and the conditional probabilities for the other variable,

Rewriting the definition of conditional probabilities,

The area of any rectangle in the diagram therefore equals the joint probability of the categories it represents.

An alternative proportional Venn diagram can be drawn from the marginal probabilities of Y and the conditional probabilites of X given Y. The area for the rectangle corresponding to any (x, y) is its joint probability, pxy.

Example

The table below is based on the world population in 2002, categorised by region and by age group. It shows the joint probabilities for a randomly chosen person being in each age/region category.

Joint probabilities
  Age
  0-19 20-64 65+
Africa and Near East 0.085 0.073 0.006
Asia 0.215 0.315 0.035
America, Europe and Oceanea 0.084 0.158 0.030

The two proportional Venn diagrams are shown below.

Note that the areas are the same in both diagrams — they are simply rearranged.

13.1.5   Calculations with probabilities

Marginal and conditional probs can be found from joint probs (and vice versa)

We have used three types of probability to describe a model for two categorical variables — the joint probabilities, the marginal probabilities for the two variables and the conditional probabilities for each variable given the value of the other variable. These sets of probabilities are closely related. Indeed, the model can be equivalently described by any of the following.

Each can be found from the others:

Bayes theorem

In particular, note that it is possible to obtain the conditional probabilities for X given Y, px | y, from the marginal probabilities of X, px, and the conditional probabilities for Y given X, py | x. This can be expressed in a single formula that is called Bayes Theorem, but it is easier in practice to do the calculations in two steps, obtaining the joint probabilities, pxy, in the first step. There are several important applications of Bayes Theorem.

Accuracy of medical diagnostic tests

There are two types of error in a test for a medical condition:

Consider a diagnostic test with

p negative | disease  =  0.05           ppositive | no disease  =  0.10

From these, we can also write

p positive | disease  =  0.95           pnegative | no disease  =  0.90

We will also assume that 10% of people who are given the test have the disease,

p disease  =  0.10

From this information, we can find the probabilities of having the disease, given the result of the diagnostic test,

13.2   Independence

13.2.1   Association

Relationships

The relationship between two numerical variables can be summarised by a correlation coefficient and least squares line. Two categorical variables may also be related.

We say that two categorical variables are associated if knowledge of the value of one tells you something about the likely value of the other.

If the conditional distribution of Y given X = x depends on the value of x, we say that X and Y are associated.

Example

We illustrate the idea of association with an artificial example relating athletic performance of high school children to their weight. The table below shows the joint probabilities for these children.

Joint Probabilities
Athletic performance
Poor Satisfactory Above average Marginal
Underweight 0.0450 0.0900 0.0150 0.1500
Normal 0.0825 0.3025 0.1650 0.5500
Overweight 0.0500 0.1200 0.0300 0.2000
Obese 0.0300 0.0650 0.0050 0.1000
Marginal 0.1700 0.5400 0.2900 1.0000

A proportional Venn diagram displays the conditional probabilities for performance, given weight category, graphically.

If we know that a child has normal weight, there is a higher probability of having above average athletic performance than an overweight child. Since the conditional probabilities for performance, given weight are different for different weight categories, the two variables are associated.

13.2.2   Independence

Independence

If the conditional probabilities for Y are the same for all values of X, then Y is said to be independent of X.

If X and Y are independent, knowing the value of X does not give us any information about the likely value for Y.

Example

An example of independence is given by the following table of joint probabilities for the weight category and mathematical ability of high school children.


Joint Probabilities
Mathematical performance
    Poor     Satisfactory Above average Marginal
Underweight 0.0225 0.1125 0.0150 0.1500
Normal 0.0825 0.4125 0.0550 0.5500
Overweight 0.0300 0.1500 0.0200 0.2000
Obese 0.0150 0.0750 0.0100 0.1000
Marginal 0.1500 0.7500 0.1000 1.0000

The proportional Venn diagram for this model is shown below.

The conditional probability of above average maths performance is the same for all weight categories — knowing a child's weight would not help you to predict maths performance. The two variables are therefore independent.

Mathematical definition of independence

If Y is independent of X, then:

13.2.3   Independence from samples

Assessing independence from a sample

Independence is an important concept, but it is defined in terms of the joint population probabilites and in most practical situations these are unknown. We must assess independence from a sample of individuals — a contingency table.

Example

The contingency table below categorises a sample of 214 individuals by gender and some other characteristic (possibly weight group or grade in a test).

Sample Data
    Male   Female Total
A 20 60 80
B 9 84 93
C 2 39 41
Total 31 183 214

Is this consistent with a model of independence of the characteristic and gender? (Are the probabilities of A, B and C grades the same for males and females?)

Estimated cell counts under independence

To assess independence, we first find the pattern of cell counts that is most consistent with independence in a contingency table with the observed marginal totals.

    Male   Female Total
A ? ? 80
B ? ? 93
C ? ? 41
Total 31 183 214

The pattern that is most consistent with independence has the following estimated cell counts:

where n denotes the total for the whole table and nx and ny denote the marginal totals for row x and column y.

Applying this to our example gives the following table:

  Male Female Total
A 80
B 93
C 41
Total 31 183 214

13.2.4   Testing for independence

Comparison of observed and estimated cell counts

We test for independence with the hypotheses:

H0 :  X and Y are independent
HA :  X and Y are dependent  

The test asks whether the observed and estimated cell counts are 'sufficiently close' — are the observed counts consistent with the counts estimated under independence?

Observed and estimated cell counts
     Male     Female  Total
A 20
(11.59)
60
(68.41)
80
B 9
(13.47)
84
(79.53)
93
C 2
(5.94)
39
(35.06)
41
Total 31 183 214

Possible test statistic?

A simple summary of how close the observed counts, nxy, are to the estimated cell counts, exy, is the sum of the squared differences,

Unfortunately this would be a bad test statistic — its distribution depends not only on the numbers of rows and columns in the table, but also on the number of individuals classified — the overall total for the table. A better test statistic is presented in the next page.

13.2.5   Chi-squared test statistic

A better test statistic

The following χ2 (pronounced chi-squared) statistic has much better properties than the raw sum of squares on the previous page

Its distribution only depends on the number of rows and columns in the contingency table.

Distribution of chi-squared statistic

When there is independence, the χ2 statistic for a contingency table with r rows and c columns has approximately a standard distribution called a chi-squared distribution with (r - 1)(c - 1) degrees of freedom.

The mean of a chi-squared distribution equals its degrees of freedom and it is skew. Some examples are given below for contingency tables of different sizes:

13.2.6   P-value for chi-squared test

Testing for independence

H0 :  X and Y are independent
HA :  X and Y are dependent  

The following test statistic is used:

If X and Y are independent
χ2 has (approximately) a chi-squared distribution with no unknown parameters
If X and Y are associated
The pattern of observed counts, nxy, is expected to be different from that of the exy, so χ2 is expected to be larger.

P-value

The p-values is interpreted in the same way as for other hypothesis tests — it describes the strength of evidence against the null hypothesis:

p-value Interpretation
over 0.1 no evidence against the null hypothesis (independence)
between 0.05 and 0.1    very weak evidence of dependence between the row and column variables
between 0.01 and 0.05    moderately strong evidence of dependence between the row and column variables
under 0.01 strong evidence of dependence between the row and column variables

Warning about low estimated cell counts

The χ2 test statistic has only approximately a chi-squared distribution. The p-value found from it can be relied on if:

If the cell counts are small enought that these conditions do not hold, the p-value is less reliable. (But advanced statistical methods are required to do better!)

13.2.7   Examples

Examples

13.2.8   Comparing groups

Contingency tables and groups

Contingency tables can either arise from bivariate categorical data or from univariate categorical data that is recorded separately from several groups.

The chi-squared test assesses independence in bivariate data. The same test can also be used to compare the different groups if there is grouped data.

Null hypothesis (corresponding to independence)
The category probabilities are the same within each group.
Alternative hypothesis (corresponding to association)
The different groups have different probabilities.

Example