Bootstrap for bivariate data

Bootstrap sampling can be used to obtain an approximate error distribution in any situation where individuals are randomly sampled from a population. Again we start with a sample of n individuals — our actual data. Each bootstrap sample is a random sample with replacement of n values from these individuals.

Bivariate data can fit into this framework — two numerical measurements, X and Y are obtained from each individual. Using bootstrap sampling, we can obtain an approximation to the error distribution of summary statistics describing the relationship between X and Y, such as their correlation coefficient.

Capital value and annual rental of domestic properties

Data were collected from a sample of 96 Auckland domestic properties in 1991. The capital value of each property was obtained and the annual rental was also found. The correlation coefficient between these two variables is 0.787, indicating a reasonably strong relationship.

How accurately does this value (0.787) estimate the correlation coefficient for all domestic properties in Auckland?

The diagram below performs a bootstrap simulation from the 96 pairs of values.

The blue crosses (and digits) show a bootstrap sample from the original 96 crosses (grey). The digits describe original crosses that have been sampled more than once.

The correlation coefficient from the bootstrap sample is shown and its error — the difference between it and the correlation coefficient for the 96 crosses from which we have sampled (0.787).

Click Accumulate and take about 100 bootstrap samples. (Hold down the Take sample button.) From the error distribution, it can be seen that...

The estimation error could be as much as 0.1 but is unlikely to be greater.

Our sample correlation coefficient of 0.787 could therefore be as much as 0.1 different from the correlation coefficient for all Auckland properties.

Finally, click Estimate s.e. and bias. The mean and standard deviation of the error distribution are estimates of the bias and standard error of the estimate.