Grouping of individuals
A simple random sample of individuals from some population is conceptually the easiest sampling scheme. However more accurate estimates of population characteristics can often be obtained with different sampling schemes.
If the individuals in the population can be split into different groups (called strata in sampling terminology), it is often better to take a simple random sample within each separate group than to sample randomly from the whole population. This is called a stratified random sample.
For example, a simple random sample of 40 students from a class of 200 males and 200 females might (by chance) include 25 males and 15 females. A stratified random sample would randomly select 20 males and 20 females, ensuring that the sex-ratio in the sample matched that in the population.
The benefits from stratified random sampling are greatest if the measurement being sampled is different in the different strata. For example, we might want to estimate the mean summer income of the students. If male students tend to have higher incomes than female students, a stratified random sample based on gender will be more accurate than a simple random sample.
Weights of animals
The diagram below shows the weights of 100 animals in a reserve. Of these, 50 are adults and the other 50 are smaller immature animals. The 50 adults tend to be heavier. (This is not real data — the difference between the two age groups is more extreme than would usually be observed — but does illustrate the potential gains from stratified sampling.)
The left half of the diagram illustrates simple random sampling of 10 from the 100 animals, whereas stratified random sampling of 5 animals from each age group is illustrated on the right.
Click Take sample a few times to observe the variability of the mean weight for the two sampling schemes. (A jittered dot plot of the means is shown to the right of each samples. A normal curve shows the distribution of the sample means.)
Observe that stratified random sampling gives sample means with less variability. The mean from a stratified random sample is therefore a more accurate estimate of the population mean.
Groups with different variability (advanced)
In stratified random samples, random samples are usually taken from the different strata in proportion to the number of population values in the strata. For example, if a population of 1,000 values is split into three strata of N1 = 500, N2 = 300 and N3 = 200 values and a sample of n = 50 is to be taken, then samples of n1 = 25, n2 = 15 and n3 = 10 would be taken from the three strata — i.e. 1/20 of the population within each stratum.
This proportionality is not however essential, and greater accuracy can be obtained by selecting larger samples from strata with greater variability. However if sample size is not proportional to stratum size, the overall sample mean is no longer appropriate for estimating the overall population mean.
If there are k strata of size N1, N2, ..., Nk,
and samples of size n1, n2, ..., nk
are taken from the strata, giving means 1,
2, ...,
k
, then the population mean should be estimated by
Weights of animals
The following diagram is similar to the one above, but in this example, there are 80 immature animals (with relatively low weight) and 20 adults with higher weight and also a higher spread in their distribution.
The left half of the diagram does stratified random sampling with sample sizes proportional to the stratum sizes (8 immature animals and 2 adults). On the right, a disproportionately large sample is taken from the adults because of their higher variability — 3 immature animals and 7 adults.
Click Take sample a few times to verify that the estimated mean weight is more accurate when a larger sample is taken from the adults — the variability in the estimate is lower.
If one stratum has an extremely large spread, it may best to record information from all individuals in that stratum, but only sample a small fraction of the others.