Sample of incomes from a town

Use the diagram to show that positive correlation between sample values leads to underestimation of the standard deviation of the mean.

If the term random sample is used, independence is implied unless dependence is specifically mentioned.

Initially r = 0, so we are taking simple random samples. Click Take sample. The estimated distribution of the sample mean from this sample is shown on the right. Take several more samples and observe that the estimated distributions have spread that roughly conforms with the actual variability of the sample means.

Now click Reset and increase the correlation between pairs of sample values to 0.9. This might correspond to selecting the sample from a particular part of the town. Take several samples. Observe that:

  • The actual variability of the sample means increases.
  • The estimated standard deviation of the mean based on the usual formula decreases.

Both of these effects mean that the standard formula for the standard deviation of the mean underestimates its variability.

This is a simulation but the context is sampling incomes from people in a town with the aim of estimating the population average income.

Increasing the correlation could correspond to sampling individuals from one part of the town.