Distribution of \(Y = g(X)\)
We showed earlier that if a random variable \(X\) has pdf \(f_X(x)\) and \(g(\cdot)\) is a monotonic function, then the transformed variable \(Y = g(X)\) has pdf
\[ f_Y(y) \;\;=\;\; f_X(h(y)) \times \left| h'(y) \right| \]where \(x = h(y)\) is the inverse function to \(g(x)\).
Unfortunately, it is often difficult to find the mean and variance of \(Y\) from the pdf that is obtained in this way. We now present approximate formulae for these quantities.
Delta method
We will only present the method informally. (A formal statement of the result and its proof are too complex for an introductory course.)
If \(X\) is a random variable with mean \(\mu\) and variance \(\sigma^2\), and its variance is small enough that the continuous function \(g(x)\) is fairly linear within the range of 'likely' values for \(X\), we can write a Taylor series approximation of \(g(X)\) around \(\mu\),
\[ Y \;\;=\;\; g(X) \;\;\approx\;\; g(\mu) + (X - \mu) g'(\mu) \]The approximation on the right is a linear function of \(X\), so we can use the results from the previous section to find its mean and variance,
\[ E[Y] \;\approx\; g(\mu) \spaced{and} \Var(Y) \;\approx\; \left(g'(\mu)\right)^2 \sigma^2 \]This is called the delta method.
Note: The approximation only works if the transformation is close to linear for the values of \(X\) that are likely to be observed.
This means that the variance of \(X\) should be small enough to make the transformation \(g(X)\) approximately linear.
Application to estimators of parameters
If \((X = \hat{\theta})\) is a consistent estimator of a parameter, \(\theta\), then \(\Var(\hat{\theta}) \to 0\) as the sample size, \(n\), increases. The delta method can therefore be used to find the approximate variance of any continuous function of the estimator, \(g(\hat{\theta})\), in large samples.
It shows that \(g(\hat{\theta})\) is approximately unbiased for \(g(\theta)\) in large samples.
Quadratic transformation
The diagram below initially shows the probability density function of a random variable \(X\) whose mean, \(\mu\), will be estimated using the mean of a random sample of \(n\) values, \(\overline{x}\). We have set \(\mu = 2\) for the purpose of this illustration but it would be an unknown parameter in practice.
We now consider the use of \(\overline{X}^2\) as an estimator of \(\mu^2\). If the distribution of \(X\) has variance \(\sigma^2\), then the delta method states that in large samples,
\[ E\Big[\overline{X}^2\Big] \approx \mu^2 \spaced{and} \Var\Big(\overline{X}^2\Big) \approx \left(g'(\mu)\right)^2 \frac{\sigma^2}{n} = (2\mu)^2 \frac{\sigma^2}{n} \]This approximation relies on the transformation being close to linear around the values of \(\overline{X}^2\) that are most likely to be observed. In the diagram above, the blue curve shows the quadratic transformation \(y = x^2\) and the straight red line has gradient \(g'(x)\).