Testing whether a finals series changes the probability of topping the league

After running 100 simulations of a league in which Team A has a probability 0.55 of winning each match, we might obtain the following contingency table describing the results.

Position of Team A
Top
after finals
Not top
after finals
Total
Top of league 42 15 57
Not top of league 7 36 43
Total 49 51 100

A 2-sample test of whether the marginal proportions (57/100 and 49/100) are from binomial distributions with the same π should not be used to test whether the probability of Team A winning has changed, since we do not have two independent samples — we really have 100 paired categorical measurements.

Note that the two diagonal cell counts (42 and 36) correspond to runs of the simulation where the position of Team A did not change after the finals series. They therefore do not hold any information about whether Team A's probability of winning has changed after the finals series. We therefore base our test only on the two off-diagonal cell counts (15 and 7).

If the probability of winning is the same before and after the finals series, the table of expected cell counts will be symmetric — both off-diagonal cell counts will have the same expected values. Each run of the simulation in which the position of Team A changes is therefore equally likely to be in the top right or bottom left cells of the table. As a result, the count in the top right cell, 15, should be a random value from a binomial distribution with n = (15+7) and π = 0.5.

To test for whether the probability of Team A winning has changed after the finals series, we can therefore refer to this binomial distribution to find the probability of 15 or more in the top right cell. Since we are performing a 2-tailed test, the p-value is double this.

The diagram below illustrates.

Click Run League. The ranks of Team A in the ordinary league and after the finals series are shown on the top left. This simulation contributes a '1' to a single cell of the contingency table on the right.

Click Accumulate then perform another 10 or 20 simulations of the league. The grey cells of the contingency table do not contribute to our test. The barchart under the table shows a binomial distribution with π = 0.5. The red bars are for counts as extreme as that in the top right cell of the contingency table. Double this tail probability gives the p-value for the test.

Hold the button Run League down until about 200 simulations have been performed. You should observe that the p-value is close to zero — there is strong evidence that the probability of Team A winning has changed. (The two off-diagonal cells would be unlikely to be so different if the probability stayed the same.)

If enough simulations were performed, the p-value would become very close to zero, allowing us to state definitely that these probabilities are different.

Types of hypothesis test based on simulations

Simulations are used to perform two distinct types of hypothesis test.

Tests about the model itself
If our probability theory was good enough, we could give a definite answer to the question of whether the finals series affected the probability of Team A winning (not just a p-value). By performing enough simulations, we would be certain to detect any difference between the probabilities of being top. However from a finite number of simulations, we get a p-value that gives the evidence from the simulations so far.
Tests involving sample data
In contrast, the test of whether the English Premier Soccer League in 1999/2000 was consistent with all teams being evenly matched was based on the standard deviation of points in a single league table, 16.1. From our model, it would be possible in theory to evaluate the p-value for this test — the probability of getting a standard deviation of 16.1 or higher if all teams were the same. Since our probability theory is not good enough, we estimated this p-value with a simulation. Performing simulations repeatedly estimates this p-value increasingly accurately, but cannot provide a definitive answer about whether all teams are equally matched.