If you don't want to print now,

Chapter 8   Transformed Variables

8.1   General methods

8.1.1   Differentiating the CDF

Relationship between f(x) and F(x)

A continuous random variable's probability density function, \(f(x)\), and its cumulative distribution function, \(F(x)\) are related by:

\[ F(x) = \int_0^x f(u) \;du \spaced{and} f(x)= F'(x) \]

This can sometimes be used to find the distribution of a random variable that is defined as a function of one or more others.

  1. Find the cumulative distribution function of \(X\), \(F(x) = P(X \le x)\).
  2. Differentiate it to get the probability density function, \(f(x)= F'(x)\)

Question: Square root of an exponential variable

Consider a random variable, \(X\), with an exponential distribution

\[ X \;\;\sim\;\; \ExponDistn(\lambda) \]

What is the distribution of \(Y = \sqrt{X}\)?

(Solved in full version)

8.1.2   Maximum of a rectangular distribution

From a random sample of \(n\) values from a rectangular distribution,

\[ X \;\;\sim\;\; \RectDistn(0, \;\beta) \]

the maximum likelihood estimate of \(\beta\) is the maximum of the values in the sample,

\[ \hat{\beta} \;\;=\;\; \max(x_1, x_2, \dots, x_n) \]

Distribution of estimator

Writing \(Y = \max(X_1, X_2, \dots, X_n)\), its CDF is

\[ \begin{align} F_Y(y) \;\;&=\;\; P(Y \le y) \\[0.4em] &=\;\; P(X_1 \le y \textbf{ and } X_2 \le y \textbf{ and }\dots, \textbf{ and } X_n \le y) \\[0.4em] &=\;\; P(X_1 \le y) \times P(X_2 \le y) \times \cdots \times P(X_n \le y) \\[0.2em] &=\;\; \left(\frac y{\beta} \right)^n \end{align} \]

Its pdf is therefore

\[ f(y) \;=\; F'(y) \;=\; \frac {n\;y^{n-1}}{\beta^n} \qquad\text{for } 0 \le y \le \beta \]

The pdf of this estimator is shown below, together with the pdf for the method of moments estimator — twice the sample mean.

Note that the method of moments estimator is unbiased, whereas the maximum likelihood estimator is biased — it is always less than \(\beta\). However the method of moments estimator is far more variable — its standard error is much higher.

Mean, variance, bias and standard error

It can be shown that the mean of \(Y\) is

\[ E[Y] \;=\; \frac {n\;\beta}{n+1} \]

The maximum likelihood estimator is therefore biased,

\[ \Bias(\hat{\beta}) \;=\; E[\hat{\beta}] - \beta \;=\; -\frac {\beta}{n+1} \]

though the bias decreases as \(n \to \infty\).

Since

\[ E[Y^2] \;=\; \frac {n\;\beta^2}{n+2} \]

its variance is

\[ \Var(Y) \;=\; E[Y^2] - \left(E[Y]\right)^2 \;=\; \frac{n\;\beta^2}{(n+1)^2(n+2)} \]

The standard error of the maximum likelihood estimator is therefore

\[ \se(\hat{\beta}) \;=\; \sqrt {\Var(\hat{\beta})} \;=\; \beta \sqrt{\frac{n}{(n+1)^2(n+2)}} \]

8.1.3   Monotonic transformations

If one random variable is a monotonic function of another — steadily increasing or decreasing — the pdfs of the variables are closely related. (The proof of this result is based on finding the cumulative distribution function of the transformed variable.)

Monotonic function of X

If a continuous random variable \(X\) has probability density function \(f_X(x)\) and another variable is defined as \(Y = g(X)\) where the function \(g(\cdot)\) is a monotonic function with inverse \(X = h(Y)\), then the pdf of \(Y\) is

\[ f_Y(y) \;\;=\;\; f_X(h(y)) \times \left| h'(y) \right| \]

(Proved in full version)

The method will be clearer in an example.

Question: Log-normal distribution

If \(X \sim \NormalDistn(\mu,\; \sigma^2)\), show that the probability density function of \(Y = \exp(X)\) is

\[ f_Y(y) \;\;=\;\; \begin{cases} \displaystyle \frac 1{y\sqrt{2\pi}\;\sigma} e^{- \frac{\large (\log(y)-\mu)^2}{\large 2 \sigma^2}} & \text{if } y \ge 0 \\[0.4em] 0 & \text{otherwise} \end{cases} \]

(Solved in full version)

\(Y\) is said to have a log-normal distribution. If a random variable \(Y\) has a log-normal distribution, we can show in a similar way that \(X = \log(Y)\) will have a normal distribution.

The next example shows the relationship between the exponential and Weibull distributions.

Question

If \(X \sim \ExponDistn(\lambda)\), show that the distribution of \(Y = X^a\) has a \(\WeibullDistn(\diagfrac 1 a, \lambda^a)\) distribution.

(Solved in full version)

8.1.4   Relationship with rectangular distribution

Cumulative distribution function and quantiles

The cumulative distribution function of a continuous random variable \(X\) has been defined as

\[ F(x) \;\;=\;\; P(X \le x) \]

and is a monotonic increasing function rising from 0 to 1. Its inverse is

\[ F^{-1}(y) \;\;=\;\; q_y \qquad \text{where }P(X \le q_y) = y \]

Therefore \(F^{-1}(y)\) returns the \(y\)'th quantile of the distribution. Note also that

\[ F\left(F^{-1}(y)\right) \;\;=\;\; y \]

Applying the CDF as a transformation

Transforming a variable into a rectangular distribution

If a continuous random variable \(X\) has cumulative distribution functions \(F(x)\), then the random variable \(Y = F(X)\) has a \(\RectDistn(0, 1)\) distribution.

(Proved in full version)

The converse of this theorem is also useful.

Transforming a rectangular variable into an arbitrary distribution

If \(F(x)\) is a monotonic continuous function of \(x\) rising from 0 to 1, with inverse function \(F^{-1}(\cdot)\), and \(Y \sim \RectDistn(0, 1)\), then the random variable \(X = F^{-1}(Y)\) has a distribution with cumulative distribution function \(F(x)\).

(Proved in full version)

We now illustrate these results with an example.

Example: Exponential distribution

If \(X \sim \ExponDistn(\lambda)\), then it has cumulative distribution function

\[ F(x) \;\;=\;\; 1 - e^{\large -\lambda x}\]

The first result above means that

\[ Y \;\;=\;\; 1 - e^{\large -\lambda X} \;\;\sim\;\; \RectDistn(0,\;1) \]

The inverse function to \(F(x)\) is

\[ F^{-1}(y) \;\;=\;\; -\frac {\log(1 - y)}{\lambda}\]

Therefore if \(Y \sim \RectDistn(0,\; 1)\), then \(X = -\dfrac {\log(1 - Y)}{\lambda}\) has a distribution with cumulative distribution function \(F(x)\) that is therefore an \(\ExponDistn(\lambda)\) distribution.

8.1.5   Generating random numbers

Computer simulations consist of realisations of models for real-life scenarios. These models involve distributions, so simulations require values that are randomly generated from the distributions. These random values are usually generated on a computer with some algorithm. Strictly speaking, these should be called pseudo-random values but will simply be called random values here.

The basis of generating random values from a distribution is usually an algorithm that generates a random value from a \(\RectDistn(0, 1)\) distribution. For example, Excel has a function to generate one such value:

=RAND()

The relationship between a \(\RectDistn(0, 1)\) distribution and one with cumulative distribution function \(F(x)\) can be used to generate a random value from an arbitrary distribution.

Random values from an arbitrary distribution

If \(y\) is a random value from a \(\RectDistn(0, 1)\) distribution, then \(F^{-1}(y)\) is a random value from the distribution with cumulative distribution function \(F(x)\).

Excel has built-in functions to evaluate \(F^{-1}(y)\) for several common distributions, including the following ones.

Distribution \(F(x)\) \(F^{-1}(y)\)
\(\NormalDistn(0, 1)\) =NORM.S.DIST(\(x\), true) =NORM.S.INV(\(y\))
\(\NormalDistn(\mu, \sigma^2)\) =NORM.DIST(\(x\), \(\mu\), \(\sigma\), true) =NORM.INV(\(y\), \(\mu\), \(\sigma\))
\(\GammaDistn(\alpha, \lambda)\) =GAMMA.DIST(\(x\), \(\alpha\), \(\frac 1{\lambda}\), true) =GAMMA.INV(\(y\), \(\alpha\), \(\frac 1{\lambda}\))

For example, a random value from a \(\NormalDistn(\mu = 10, \sigma^2 = 4)\) distribution can be generated in Excel by typing the following into a spreadsheet cell:

=NORM.INV(RAND(), 10, 2)

Generating values from a discrete distribution

Although the methodology above is easiest to explain for continuous random variables, it can also be used to generate random numbers from discrete distributions.

Excel only has a function for the inverse of the binomial distribution's CDF. Typing the following into a spreadsheet cell generates a random value from a binomial distribution:

=BINOM.INV(\(n\), \(\pi\), RAND())

The method can however be applied to other discrete distributions too.

Example: Generating values from a Poisson distribution

The diagram below shows the cumulative distribution function for a \(\PoissonDistn(\lambda = 3)\) distribution — a step function. From a randomly generated \(Y\) with a \(\RectDistn(0, 1)\) distribution, you would read across and down to find a random value from the discrete \(\PoissonDistn(\lambda = 3)\) distribution.

8.2   Linear transformations

8.2.1   Mean and variance

Linear transformations

In this section, we concentrate on a random variable that is defined as a linear transformation of another.

Mean and variance

If \(X\) is a random variable and \(a\) and \(b\) are constants, then the random variable \(Y = a + bX\) has the following mean and variance.

\[ \begin{align} E[Y] &= a + b \times E[X] \\[0.4em] Var(Y) &= b^2 \times Var(X) \end{align} \]

(Proved in full version)

8.2.2   Distribution of transformed variable

Probability density function

Since \(Y = a + bX\) is a monotonic transformation, we can apply the earlier general results to find the pdf of \(Y\). Writing

\[ y = g(x) = a + bx \qquad x = h(y) = \frac{y-a}{b} \spaced{and} h'(y) = \frac 1 b \]

The random variable \(Y\) has pdf

\[ f_Y(y) \;\;=\;\; f_X\left(h(y)\right) \times h'(y) \;\;=\;\; \frac 1 {\left|b\right|} f_X\left(\frac{y-a}{b}\right) \]

We now apply this to linear transformations of a normal random variable.

Linear transformation of a normal variable

If \(a\) and \(b\) are constants and \(X \sim \NormalDistn(\mu, \sigma^2)\), the random variable \(Y = (a + bX)\) also has a normal distribution

\[ Y \;\;\sim\;\; \NormalDistn(a + b\mu,\; b^2 \sigma^2) \]

(Proved in full version)

In particular, this result provides the distribution of a z-score.

Distribution of z-scores

If \(X \sim \NormalDistn(\mu, \sigma^2)\), the random variable \(Z = \dfrac {X-\mu} {\sigma} \) has a normal distribution with zero mean and standard deviation one,

\[ Z \sim \NormalDistn(0, 1) \]

(Proved in full version)

Probabilities about z-scores can be found using computer software or tables.

8.2.3   Scale and location parameters

In some families of distributions, a linear transformation results in another distribution within the same family.

Definition

In a family of distributions, \(X \sim \mathcal{Distn}(\theta)\), the parameter \(\theta\) is called a location parameter if \(Y = (X + a) \sim \mathcal{Distn}(\theta + a)\).

If the family of distributions has additional parameters, they should remain unchanged after the transformation.

A location parameter is affected by adding a constant to \(X\); a scale parameter is affected by multiplying \(X\) by a constant.

Definition

In a family of distributions, \(X \sim \mathcal{Distn(\phi)}\), the parameter \(\phi\) is called a scale parameter if \(Y = bX \sim \mathcal{Distn(b\phi)}\).

In families of distributions with a location parameter \(\theta\), \(X \sim \mathcal{Distn(\theta, \phi)}\), the parameter \(\phi\) is also called a scale parameter if \(Y = bX \sim \mathcal{Distn(b\theta, b\phi)}\).

If the family of distributions has additional parameters, they should again remain unchanged after the transformation.

We now apply this to normal distributions.

Normal distribution

If \(X \sim \NormalDistn(\mu, \sigma)\), we showed that \(Y = a + bX \sim \NormalDistn\left(a + b\mu, (b\sigma)^2\right)\).

Since \(Y = X + a \sim \NormalDistn\left(a + \mu, {\sigma}^2\right)\), the distribution's first parameter, \(\mu\), is a location parameter.

\(\sigma\) satisfies the second definition for scale parameters since if \(X\) has a normal distribution with parameters \(\mu\) and \(\sigma\), \(bX\) has one with the corresponding parameters \(b\mu\) and \(b\sigma\).

8.2.4   Scale and location examples

Reparameterisation of a family of distributions may be necessary before we can identify location and scale parameters.

Rectangular distribution

Rectangular distributions are usually defined to have

\[ f(x) = \begin{cases} \frac {\large 1} {\large \beta-\alpha} & \quad\text{for } \alpha \lt x \lt \beta \\[0.2em] 0 & \quad\text{otherwise} \end{cases} \]

Neither \(\alpha\) nor \(\beta\) are location or scale parameters since

\[ Y = a + bX \sim \RectDistn(a + b\alpha, a + b\beta) \]

However if the distribution is reparameterised with

\[ f(x) = \begin{cases} \frac {\large 1} {\large \phi} & \quad\text{for } \alpha \lt x \lt \alpha + \phi\\[0.2em] 0 & \quad\text{otherwise} \end{cases} \]

then \(Y = a + bX\) has a rectangular distribution with pdf

\[ f_Y(y) = \begin{cases} \frac {\large 1} {\large b\phi} & \quad\text{for } (a + b\alpha) \lt y \lt (a + b\alpha) + b\phi\\[0.2em] 0 & \quad\text{otherwise} \end{cases} \]

This is a rectangular distribution with parameters \(\alpha^* = a + b\alpha\) and \(\phi^* = b\phi\) so \(\alpha\) and \(\phi\) are location and scale parameters.

In the following example, reparameterisation is again necessary before a scale parameter can be found.

Question

In a \(\GammaDistn(\alpha, \beta)\) distribution, show that \(\phi = \frac {\large 1}{\large \beta}\) is a scale parameter.

(Solved in full version)

8.3   Delta method

8.3.1   Variance of transformed variable

Distribution of \(Y = g(X)\)

If \(X\) has pdf \(f_X(x)\) and \(g(\cdot)\) is a monotonic function, then the transformed variable \(Y = g(X)\) has pdf

\[ f_Y(y) \;\;=\;\; f_X(h(y)) \times \left| h'(y) \right| \]

where \(x = h(y)\) is the inverse function to \(g(x)\). Its mean and variance are however often difficult to find from this pdf.

Delta method

We now informally present a way to get approximate values for \(E[Y]\) and \(\Var(Y)\).

If \(X\) has mean \(\mu\) and variance \(\sigma^2\), a Taylor series approximation of \(g(X)\) around \(\mu\) is

\[ Y \;\;=\;\; g(X) \;\;\approx\;\; g(\mu) + (X - \mu) g'(\mu) \]

We can find the mean and variance of the linear function of \(X\) on the right, giving the approximation,

\[ E[Y] \;\approx\; g(\mu) \spaced{and} \Var(Y) \;\approx\; \left(g'(\mu)\right)^2 \sigma^2 \]

This is called the delta method.

The approximation relies on the transformation being nearly linear for the values of \(X\) that are likely to be observed. \(\Var(X)\) must therefore be small.

Application to estimators of parameters

If \((X = \hat{\theta})\) is a consistent estimator of a parameter, \(\theta\), then\(\Var(\hat{\theta}) \to 0\) as the sample size, \(n\), increases. The delta method therefore gives an approximate mean and variance for any continuous function of it, \(g(\hat{\theta})\), in large samples.

Quadratic transformation

We now consider a random variable, \(X\), that is assumed to arise from a family of distributions with mean \(\mu\) and variance \(\sigma^2\). For example, its distribution might be as shown below.

The mean of a random sample, \(\overline X\), might be used to estimate \(\mu\), so we now consider \(\overline X ^2\) as an estimator of \(\mu^2\). The delta method states that in large samples,

\[ E\Big[\overline{X}^2\Big] \approx \mu^2 \spaced{and} \Var\Big(\overline{X}^2\Big) \approx \left(g'(\mu)\right)^2 \frac{\sigma^2}{n} = (2\mu)^2 \frac{\sigma^2}{n} \]

This approximation relies on the transformation being close to linear around the values of \(\overline{X}^2\) that are most likely to be observed.

When \(n = 10\), the sample mean is likely to be between 1.0 and 3.5, but the quadratic (blue) is far from linear (red) so the delta method will not give accurate values for \(E\Big[\overline{X}^2\Big]\) or \(\Var\Big(\overline{X}^2\Big)\).

However when \(n = 1,000\), the quadratic curve is almost linear within the range of likely x-values (1.9 to 2.1) so the delta method will work well.

8.3.2   Examples

We now give two applications of the delta method.

Question: Estimator of a geometric distribution's parameter, π

If \(X \sim \GeomDistn(\pi)\), with probability function

\[ p(x) = \pi (1-\pi)^{x-1} \quad \quad \text{for } x = 1, 2, \dots \]

the method of moments estimator of \(\pi\) and its maximum likelihood estimator are both the inverse of the sample mean,

\[ \hat{\pi} \;\;=\;\; \dfrac 1{\overline{X}} \]

Use the Delta method to find the approximate mean and variance of this estimator.

(Solved in full version)

In this example, the delta method gives the same approximate standard error as would be found using the second derivative of the log-likelihood, but approximations from the two methods are not always equal.

Odds

Uncertainty is often described by probability, but the chance of an event happening can alternatively be described by its odds.

Definition

The odds for an event are the ratio of the probability of the event happening to the probability of it not happening,

\[ \operatorname{odds}(E) \;\;=\;\; \frac{P(E)}{1 - P(E)} \]

Note that whereas probabilities must be between 0 and 1, the odds of an event can be greater than 1.

Question: Odds of success

In a series of \(n\) independent success/failure trials that each have odds \(\theta\) of success, \(x\) successes are observed. What is the maximum likelihood estimator of \(\theta\)? If \(n\) is large, what is its approximate standard error?

(Solved in full version)