Long page
descriptions

Chapter 9   Estimating Parameters

9.1   Introduction to estimation

9.1.1   Interest in populations

When data are collected, we are usually more interested in the unknown population distribution from which we assume that the data were generated.

9.1.2   Interest in parameters

A few numerical summaries of a population distribution (parameters) often capture its most interesting characteristics. The corresponding sample statistics provide estimates.

9.1.3   Applications of estimation

This page gives a few examples where sample statistics are used to estimate important population parameters.

9.1.4   Estimation error

The difference between the population parameter and a sample estimate is called the error in the estimate.

9.1.5   Distribution of errors

Estimation errors vary from sample to sample and have distributions.

9.1.6   Standard error and bias

A good estimator has errors that are close to zero. The error distribution should ideally be centred on zero (unbiased) and have low standard deviation (standard error). The standard error is the standard deviation of both the estimator and the estimation error.

9.1.7   Interval estimates

An interval estimate is a range of values within which we are 'confident' that the unknown population parameter lies.

9.2   Standard error of mean

9.2.1   Error distribution for mean

When using a sample mean to estimate a population mean, the errors have a distribution with mean zero. The standard deviation of the errors (standard error) is the standard deviation of the sample mean.

9.2.2   Standard error when σ is known

If the population standard deviation is known, the standard error can be evaluated.

9.2.3   Interpreting the standard error

The estimation error has about 95% probability of being within 2 standard errors of zero and is almost certainly within 3 standard errors.

9.2.4   Standard error when σ is unknown

In most practical applications, the population standard deviation is unknown. The standard error of the sample mean can be approximated by replacing the population standard deviation by the sample standard deviation in its formula.

9.2.5   Standard error vs standard deviation

It is important to distinguish between the interpretation (and value) of the standard error (SE) and standard deviation (SD).

9.2.6   Using SEs to compare estimators ((advanced))

If there are two alternative estimators of a parameter, the estimator with lower standard error is usually better. The sample mean is shown to be a better estimator of the centre of a normal population than the sample median.

9.2.7   More about bias ((advanced))

Unbiased estimators are usually preferred. The sample median is shown to be a biased estimator of the mean of a skew distribution.

9.3   Confidence interval for mean

9.3.1   Confidence interval from standard error

The estimation error for any unbiased estimator, has approximately 0.95 probability of being between -2SE and +2SE. An approximate 95% confidence interval for the parameter is therefore the estimate ± 2SE.

9.3.2   Confidence interval for mean, known σ

If the population standard deviation is known, the standard error can be found exactly. A 95% confidence interval is the sample mean ± twice this. (Or more exactly, 1.96 times the standard error.)

9.3.3   Confidence level

A simulation shows that 95% confidence intervals are random — they vary from sample to sample. About 95% of samples give confidence intervals that include the true parameter.

9.3.4   Confidence level if σ is replaced by s

In practice, the population standard deviation is usually unknown. If the population SD is simply replaced by its sample equivalent, the interval estimate has a lower confidence level than 95%.

9.3.5   Confidence interval for mean, unknown σ

To get a 95% confidence level, a t-value from tables must replace the constant 1.96.

9.3.6   Properties of 95% confidence interval

A simulation demonstrates that the resulting 95% confidence intervals have probability 0.95 of including the population mean.

9.3.7   Examples

Some examples of 95% confidence intervals for population means are given and interpreted.

9.4   Estimating proportions

9.4.1   General framework for estimation

The methodology for describing the accuracy of a sample mean using standard errors and confidence intervals can also be used for other parameter estimates.

9.4.2   Estimating a proportion

A sample proportion estimates the corresponding population proportion, π. There is likely to be an error in this estimate and these errors have a distribution.

9.4.3   Error distribution

The estimation errors have a type of binomial distribution that is scaled to have mean zero. Its standard deviation is the standard error of the proportion.

9.4.4   Normal approximation to error distribution

If the sample size is high enough, the error distribution is approximately normal. This page gives a few examples for which the error distribution is found.

9.4.5   Confidence interval for proportion

A 95% confidence interval for a population proportion is the sample proportion ± twice its standard deviation. Its confidence level is only approximately 95% and guidelines are given for the minimum sample size.

9.4.6   Properties of 95% CI for proportion

If samples are repeatedly taken, about 95% of them result in 95% confidence intervals that include the population proportion. Guidelines are given for the minimum sample size to make the confidence level close to 95%.

9.4.7   Confidence interval examples

95% confidence intervals for proportions are found and interpreted for several data sets.

9.5   More about estimation

9.5.1   Margin of error

The margin of error for a survey is related to a confidence interval. It is close to a 95% CI when p is 0.5, but underestimates the accuracy of small or large sample proportions.

9.5.2   Sample size for estimating mean

Given a target width for a 95% confidence interval, it is possible to determine the necessary sample size to achieve this accuracy.

9.5.3   Sample size for estimating proportion

In a similar way, given a target width for a 95% confidence interval for a proportion, it is possible to determine the necessary sample size.

9.5.4   Other confidence levels

All earlier confidence intervals had 95% confidence level. Replacing 1.96 (or 2) with other values gives interval estimates with different confidence levels.

9.6   Simulation & bootstrap (advanced)

9.6.1   Need for simulation

Formulae exist for the standard errors of many common estimators. If such a formula is not available, a different approach is needed.

9.6.2   Error distribution by simulation

If a formula for the standard error cannot be found, a simulation can often be used to find the error distribution.

9.6.3   Simulations with normal distns

An example is shown where a simulation provides the error distribution and standard error for an upper quartile.

9.6.4   Bootstrap error distribution

If the population distribution does not seem to be normal, simulations can be based on samples with replacement from the actual data.

9.6.5   Standard error of correlation

Bootstrap samples can also be used to generate an approximate error distribution (and standard error) for many types of estimator. Their use to find the standard error of a correlation coefficient is described.