Sampling frame

Both simple random samples and stratified random samples require a complete list of all individuals in the target population. This list might be obtained from an electoral roll or some other publicly available list and is called a sampling frame.

In other situations, a complete list is unavailable, so a different sampling scheme is necessary. For example, a town council might be interested in collecting information from teenage children. Without a complete list of such children, how might you sample them?

Cluster sampling

One solution to this problem is to group the target individuals into reasonably small groups, called clusters, for which a complete list is available. Clusters are similar to the strata that are used for stratified sampling, but are usually much smaller. For example, to sample teenage children in a town, the clusters might be defined by the different streets. (Long streets might be split into shorter sections.) It is not necessary to know beforehand how many children live in each street.

For cluster sampling, a simple random sample of clusters is selected, with all individuals in these clusters selected. For example, in any selected street, an interviewer might approach each household in order to identify the households with teenage children and obtain information from them.

The mean of any variable that is calculated from the individuals in a cluster sample can be used to estimate the corresponding characteristic of the underlying population.

Cost advantages

Even when a complete sampling frame is available, cluster sampling might be used to reduce the cost of sampling (or to increase the sample size for the same cost) since it is often cheaper to record information from individuals in the same cluster than from different parts of the sampling frame.

For example, it is cheaper to interview people in every house in several streets than to interview the same number of individuals who are scattered randomly over the town.

The diagram below illustrates cluster sampling. The population of 324 individuals has been split into 36 clusters, each of which contains 9 individuals. (In many practical situations, the cluster sizes would be different.)

Click Take sample to take a cluster sample of 27 individuals — i.e. to take a simple random sample of 3 clusters.

Accuracy of cluster sampling

When individuals in the same cluster tend to be more similar than individuals in different clusters, the estimates that are obtained from cluster sampling are more variable (and hence less accurate) than the corresponding estimates from a simple random sample of the same size.

The ordinary formula for the standard deviation of a sample mean will overestimate its accuracy if the mean is obtained by cluster sampling.


The diagram below illustrates sampling from a population of 25 clusters, each of which contains 4 individuals. (The different vertical bands represent the different clusters in the population.)

Click Take sample several times to build up the distribution of the mean using cluster sampling of 5 clusters. Choose Simple random sample from the pop-up menu then take several more samples. Since all clusters have similar spreads of values, simple random sampling and cluster sampling result in sample means with similar distributions (i.e. similar accuracy).

Move the slider to about half-way to accentuate the differences between the clusters (the four values in each cluster become more similar) and observe that this has little effect on the accuracy of the sample mean when simple random sampling is used. However the sample mean becomes much more variable with cluster sampling.

Cluster sampling is as good as simple random sampling if the clusters are similar to each other, but is worse if the clusters are different.