Distribution of \(Y = g(X)\)

We showed earlier that if a random variable \(X\) has pdf \(f_X(x)\) and \(g(\cdot)\) is a monotonic function, then the transformed variable \(Y = g(X)\) has pdf

\[ f_Y(y) \;\;=\;\; f_X(h(y)) \times \left| h'(y) \right| \]

where \(x = h(y)\) is the inverse function to \(g(x)\).

Unfortunately, it is often difficult to find the mean and variance of \(Y\) from the pdf that is obtained in this way. We now present approximate formulae for these quantities.

Delta method

We will only present the method informally. (A formal statement of the result and its proof are too complex for an introductory course.)

If \(X\) is a random variable with mean \(\mu\) and variance \(\sigma^2\), and its variance is small enough that the continuous function \(g(x)\) is fairly linear within the range of 'likely' values for \(X\), we can write a Taylor series approximation of \(g(X)\) around \(\mu\),

\[ Y \;\;=\;\; g(X) \;\;\approx\;\; g(\mu) + (X - \mu) g'(\mu) \]

The approximation on the right is a linear function of \(X\), so we can use the results from the previous section to find its mean and variance,

\[ E[Y] \;\approx\; g(\mu) \spaced{and} \Var(Y) \;\approx\; \left(g'(\mu)\right)^2 \sigma^2 \]

This is called the delta method.

Note: The approximation only works if the transformation is close to linear for the values of \(X\) that are likely to be observed.

This means that the variance of \(X\) should be small enough to make the transformation \(g(X)\) approximately linear.

Application to estimators of parameters

If \((X = \hat{\theta})\) is a consistent estimator of a parameter, \(\theta\), then \(\Var(\hat{\theta}) \to 0\) as the sample size, \(n\), increases. The delta method can therefore be used to find the approximate variance of any continuous function of the estimator, \(g(\hat{\theta})\), in large samples.

It shows that \(g(\hat{\theta})\) is approximately unbiased for \(g(\theta)\) in large samples.

Quadratic transformation

The diagram below initially shows the probability density function of a random variable \(X\) whose mean, \(\mu\), will be estimated using the mean of a random sample of \(n\) values, \(\overline{x}\). We have set \(\mu = 2\) for the purpose of this illustration but it would be an unknown parameter in practice.

We now consider the use of \(\overline{X}^2\) as an estimator of \(\mu^2\). If the distribution of \(X\) has variance \(\sigma^2\), then the delta method states that in large samples,

\[ E\Big[\overline{X}^2\Big] \approx \mu^2 \spaced{and} \Var\Big(\overline{X}^2\Big) \approx \left(g'(\mu)\right)^2 \frac{\sigma^2}{n} = (2\mu)^2 \frac{\sigma^2}{n} \]

This approximation relies on the transformation being close to linear around the values of \(\overline{X}^2\) that are most likely to be observed. In the diagram above, the blue curve shows the quadratic transformation \(y = x^2\) and the straight red line has gradient \(g'(x)\).

n = 1
The quadratic curve is not close to the red straight line when a sample of size \(n = 1\) is used, so the delta method would give a very inaccurate estimate of the mean and variance of the estimator.
n = 100
When the pop-up menu increases the sample size to 10, then 100, the sample mean will be fairly close to \(\mu\) and, within this narrower range, the quadratic curve is closer to linear. The delta method therefore gives a much closer approximation to the mean and variance of the estimator.
n = 1,000
Finally increase the sample size to \(n = 1,000\). The quadratic transformation is now very close to a linear transformation with slope \(g'(\mu) = 2\mu\), at least within the range of values for \(\overline{x}\) that are likely, so the delta method will provide quite accurate values for the mean and variance of the estimator.