Does the model really fit the data?

Our model can explain the decrease in expected winning times for the 1500m race in the Olympics over the last century, but do the actual winning times really match what is predicted by the model?

Did you notice that the winning times that we simulated in the previous page seemed to fluctuate more than the real winning times?

To answer this question, we must pick a summary statistic that is likely to behave differently if the model really does implies that the variation will be too high.

Number of records

If there has been an improvement in abilities over the century, on top of the increasing pool of potential competitors, we would expect a higher number of Olympic records.

In real life, a new record was set on 15 occasions. (Strictly, Olympic records can be set in heats as well as in the final, but we will only treat performances in the final as being elligible for records here!) What would be the chance of getting as many records if our model of a constant underlying population distribution was true?

The simulation below helps to answer this question.

The number of records is displayed on the right of the diagram. Click Faster Algorithm and Accumulate, then click Take sample several times to build up the sampling distribution of the number of records. Take about 100 samples.

You should observe that there would be virtually no chance of getting as many as 15 records if our model actually held.

Conclusion

We can therefore conclude that the assumption of an unchanging underlying population does not seem to be justified — there must have also been an improvement in training and methodology over the last century.

(This last conclusion is not affected by the distribution that we assumed for speeds in the underlying population — normal with mean 458 seconds and standard deviation 50 seconds. Whatever the shape of the population distribution, the distribution of the number of records is the same.)

Simulations can provide useful information about a process that would be very difficult to obtain by analytic means.