Randomness of data

Not only do we usually have little interest in the specific individuals from whom data were collected, but we must also acknowledge that our data would have been different if, by chance, we had selected different individuals or even made our measurements at a different time.

We must acknowledge this sample-to-sample variability when interpreting the data. The data are random.

All graphical and numerical summaries would be different if we repeated data collection.

This randomness in the data must be taken into account when we interpret graphical and numerical summaries. Our conclusions should not be dependent on features that are specific to our particular data but would (probably) be different if the data were collected again.

Hardness of brick pavers

In an experiment to assess the durability of one type of brick pavers, a sharpened drill impacted the surface of 10 pavers for a period of 1 minute. The volume of material eroded (mL) was recorded.

If the experiment was repeated with a different sample of brick pavers of the same type, different values would be obtained. Click Repeat experiment with 10 different pavers to see how the recorded data might change.

The dot plot, mean and standard deviation all vary considerably.

The results from a single experiment clearly tell us something about the hardness of this type of paver, but how do we take into consideration the randomness?

Use the pop-up menu to increase the sample size and repeat.

With a bigger data set, the dot plot, mean and standard deviation vary less between the different data sets.