Bootstrap for bivariate data

Bootstrap sampling can be used to obtain an approximate error distribution in any situation where individuals are randomly sampled from a population. Again we start with a sample of n individuals — our actual data. Each bootstrap sample is a random sample with replacement of n values from these individuals.

Bivariate data can fit into this framework — two numerical measurements, X and Y are obtained from each individual. Using bootstrap sampling, we can obtain an approximation to the error distribution of summary statistics describing the relationship between X and Y, such as their correlation coefficient.

Abdomen and hip circumference

Various measurements of body shape were recorded from a group of 252 men. The correlation coefficient between their abdomen circumference and hip circumference was 0.874 indicating a fairly strong relationship

How accurately does this value (0.874) estimate the correlation coefficient for all men?

The diagram below performs a bootstrap simulation from the 252 pairs of values.

The blue crosses (and digits) show a bootstrap sample from the original 252 crosses (grey). The digits describe original crosses that have been sampled more than once.

The correlation coefficient from the bootstrap sample is shown and its error — the difference between it and the correlation coefficient for the 252 crosses from which we have sampled (i.e. 0.874).

Click Accumulate and take about 100 bootstrap samples. (Hold down the Take sample button.) From the error distribution, it can be seen that...

The estimation error could be as much as 0.05 but is unlikely to be greater.

Our sample correlation coefficient of 0.874 could therefore be as much as 0.05 different from the correlation coefficient for all men.

Finally, click Estimate s.e. and bias. The mean and standard deviation of the error distribution are estimates of the bias and standard error of the estimate.