Modelling two-group data
For data that consist of independent samples from two groups, we use a separate univariate model for each group.
The data in each group are separately modelled as a random sample from a univariate distribution.
The details depend on the type of measurement from each group.
Are the two groups the same?
We are often interested in differences between two groups. The model for two-group 'success/failure' data involves only two parameters, π1 and π2, so we will assess the difference between the probabilities, π2 - π1. If this difference is zero, then both groups are the same.
The value of π2 - π1 concisely describes any difference between the two groups.
In practice, the value of π2 - π1 is unknown, but it can be estimated from sample data. The difference between the sample proportions, p2 - p1, is an estimate. However p2 - p1 is a random quantity that varies from sample to sample, so its variability must be taken into account when interpreting its value.
Typical data sets
The diagram below shows a few data sets in which either 'success' or 'failure' is recorded from each individual in two groups.
Each data set is summarised by a contingency table.
Note that the red questions do not refer to the specific individuals in the study. They ask about differences between the groups 'in general'.
We are interested in the population difference π2 - π1
rather than the sample difference p2 - p1.
We need to understand the accuracy of our point estimate.
Simulation of sample-to-sample variability
The diagram below selects samples of size 100 from each of two categorical populations.
Initially the probability of a success in Group A is π1 = 0.30, so we expect 30 successes and 70 failures from a sample of 100 values. In Group B, π2 = 0.40, so we expect 40 successes. The table above shows these expected counts and a random sample from the model.
Click Take sample a few times to observe the variability of the sample counts of successes and failures. The sample proportions and their difference are shown on the right.
The difference, p2 - p1, varies from sample to sample and is often not equal to the population difference, π2 - π1.
Finally, use the two sliders to adjust the values of the population probabilities, π1 and π2. Observe that:
If π1 and π2 are the same, positive and negative values for p2 - p1 occur similar numbers of times — its distribution is centred on zero.
We will examine the distribution of p2 - p1 more carefully in the next page.