Simulation and randomisation

Simulation and randomisation are closely related techniques. Both are based on assumptions about the model underlying the data and involve randomly generated data sets.

Simulation
New data sets are generated directly from the model.
Randomisation
Modifications to the actual data are identified that would have the same probability of arising if the model held. New data sets are randomly picked from these.

Randomisation is understood most easily through an example.

Comparing two groups

If random samples are taken from two populations, we are often interested in whether the populations have the same means.

If the two populations were identical, any allocation of the sample values to the two groups would have been as likely as the observed sample data. By observing the distribution of the difference in means from such randomised allocations of values to groups, we can get an idea of whether the actual difference in sample means is unusually large.

An example helps to explain this method.

Characteristics of failed companies

A study in Greece compared characteristics of 68 healthy companies with those of another 33 that had recently failed. The jittered dot plots on the left below show the ratio of current assets to current liabilities for each of the 101 companies.

The mean asset-to-liabilities ratio for the sample of failed companies is 0.902 lower than that for the healthy companies, but the distributions overlap. Might this difference be simply a result of randomness, or can we conclude that there is a difference in the underlying populations?

Click Randomise to randomly pick 33 of the the 101 values for the failed group. If the underlying distribution of asset-to-liabilities ratios was the same for healthy and failed companies, each such randomised allocation would be as likely as the observed data.

Click Accumulate and repeat the randomisation several more times. Observe that the difference in means would rarely be as far from zero as -0.902 when we assume the same distribution for both groups. This strongly suggests that the distributions must be different.

Since the actual difference is so unusually large, ...

We can conclude that there is strong evidence that the mean asset-to-liability ratio is lower for failed companies than healthy ones.