Randomness of data

The data that we have collected are often representative of something more general. We are not interested in the specific individuals from which we collected information but want to use the measurements from them to generalise.

However if we collected similar information again (from a different selection of individuals), we would obtain different data values, so we must acknowledge this sample-to-sample variability when interpreting the data. The data are random.

All graphical and numerical summaries would be different if we repeated data collection.

The randomness in the data must be taken into account in our interpretation of graphical and numerical summaries.

Yam growth

Ten yam plants were grown in a research station and the growth (cm) of the main stalks over 7 days was recorded from each plant.

The results are dependent on the specific yam plants that were grown. Click Grow 10 different yam plants to see how the results might change if the data were collected again.

The dot plot, mean and standard deviation all vary considerably.

The results from a single experiment clearly tell us something about the hardness of this type of paver, but how do we take into consideration the randomness?

Use the pop-up menu to increase the sample size and repeat.

With a bigger data set, the dot plot, mean and standard deviation vary less between the different data sets.