If you don't want to print now,

Chapter 5   Continuous Distributions

5.1   Finding probabilities

5.1.1   Probabilities by integration

Probabilities as areas

The distributions of a continuous random variable is defined by a type of histogram called a probability density function (pdf), with the following properties

Based on the properties of histograms, the probability of a value between any two constants is the area under the pdf above this range of values.

Probabilities by integration

Since probability density functions can usually be expressed as simple mathematical functions, these areas can be found as integrals,

\[ P(a \lt X \lt b) \;\; = \; \; \int_a^b {f(x)}\; dx \]

Properties of a probability density function

A function \(f(x)\) can be the probability density function of a continuous random variable if and only if

\[ f(x) \;\; \ge \; \; 0 \quad\quad \text{for all } x \text{, and} \] \[ \int_{-\infty}^{\infty} {f(x)}\; dx \;\; = \; \; 1 \]

(Proved in full version)

5.1.2   Rectangular distribution

The simplest kind of continuous distribution is a rectangular distribution (also called a continuous uniform distribution).

Definition

A random variable, \(X\), is said to have a rectangular distribution with parameters \(a\) and \(b\)

\[ X \;\; \sim \; \; \RectDistn(a, b) \]

if its probability density function is

\[ f(x) = \begin{cases} \frac {\large 1} {\large b-a} & \text{for } a \lt x \lt b \\[0.2em] 0 & \text{otherwise} \end{cases} \]

Probabilities for rectangular random variables can be easily found using geometry.

Equivalently, using integration,

\[ \begin{align} P(c \lt X \lt d) \;\; &= \; \; \int_c^d {f(x)}\; dx \\ &= \; \; \int_c^d {\frac 1 {b-a}}\; dx \\ &=\;\; \frac {d-c} {b-a} \end{align} \]

Example

If \(X \;\; \sim \; \; \RectDistn(0, 10)\),

\[ P(4 \lt X \lt 7) \;\;=\;\; \frac {7-4} {10-0} \;\;=\;\; 0.3 \]

5.1.3   Other examples

In the next two examples, integration is used to find probabilities.

Question

If a continuous random variable, \(X\), has probability density function

\[ f(x) = \begin{cases} 1 - \dfrac x 2 & \quad \text{for } 0 \lt x \lt 2 \\[0.2em] 0 & \quad \text{otherwise} \end{cases} \]

what is the probability of getting a value less than 1?

(Solved in full version)

The next example involves a distribution called an exponential distribution; practical applications of this distribution will be described in the next chapter.

Question

If a continuous random variable, \(X\), has probability density function

\[ f(x) = \begin{cases} 4\;e^{-4x} & \quad \text{for } x \ge 0\\[0.2em] 0 & \quad \text{otherwise} \end{cases} \]

what is the probability of getting a value less than 1?

(Solved in full version)

5.1.4   Cumulative distribution function

The cumulative distribution function has the same definition for a continuous random variable as for a discrete one.

Definition

The cumulative distribution function (CDF) for a continuous random variable \(X\) is the function

\[F(x) \;=\; P(X \le x)\]

This probability can be expressed as an integral,

\[F(x) \;\; = \; \; \int_{-\infty}^x f(t)\;dt\]

Note that this also implies that

\[f(x) \;\; = \; \; \frac {d}{dx} F(x)\]

All cumulative distribution functions monotonically rises from zero to one. However whereas a discrete distribution's CDF is a step function, that of a continuous distribution is a smooth function.

Question: Rectangular distribution

Sketch the cumulative distribution function of a random variable with a rectangular distribution, \(X \sim \RectDistn(1, 5)\).

Question: Exponential distribution

If \(X\) has probability density function

\[ f(x) = \begin{cases} 4\;e^{-4x} & \quad \text{for } x \ge 0\\[0.2em] 0 & \quad \text{otherwise} \end{cases} \]

what is its cumulative distribution function?

(Both solved in full version)

5.1.5   Quantiles

A cumulative probability, \(P(X \le x)\), can be found by integration. It is sometimes useful to work in the opposite direction — given a cumulative probability, what is the corresponding value of \(x\)?

Definition

The \(p\)'th quantile of a continuous distribution is the value, \(x\), such that

\[ P(X \le x) \;\; = \; \; p \]

When \(p\) is expressed as a percentage, the value is called the \(100p\)'th percentile.

Definition

These three values split the probability density function into four equal areas.

Question

What are the median and quartiles of the \(\RectDistn(1, 5)\) distribution?

(Solved in full version)

The next example is a little harder.

Question

Find a formula for the \(p\)'th quantile of the exponential distribution with probability density function

\[ f(x) = \begin{cases} 4\;e^{-4x} & \text{for } x \ge 0\\[0.2em] 0 & \text{otherwise} \end{cases} \]

(Solved in full version)

5.2   Mean and variance

5.2.1   Expected values

For an infinitesimally small interval of width \(\delta x\),

\[ P(x \lt X \lt x+\delta x) \;\approx\; f(x) \times \delta x\]

If the whole range of possible x-values is split into such slices, the definition of an expected value for a discrete random variables would give

\[ E[X] \;\approx\; \sum {x \times f(x) \; \delta x}\]

In the limit, this summation becomes an integral, giving us the following definition.

Definition

The expected value of a continuous random variable with probability density function \(f(x)\) is

\[ E[X] \;=\; \int_{-\infty}^{\infty} {x \times f(x) \; d x}\]

This can be generalised:

Definition

If \(X\) is a continuous random variable with probability density function \(f(x)\), the expected value of any function \(g(X)\) is

\[ E\big[g(X)\big] \;=\; \int_{-\infty}^{\infty} {g(x) \times f(x) \; d x}\]

5.2.2   Mean and variance

We define the mean and variance of a continuous distribution in a similar way to those of a discrete distribution.

Definition

The mean of a continuous random variable is

\[ E[X] \;=\; \mu \]

and its variance is

\[ \Var(X) \;=\; \sigma^2 \;=\; E \left[(X - \mu)^2 \right] \]

Their interpretations are also similar.

The following result is often useful for evaluating a continuous distribution's variance.

Alternative formula for the variance

A continuous random variable's variance can be written as

\[ \Var (X) \;=\; E \left[(X - \mu)^2 \right] \;=\; E[X^2] - \left( E[X] \right)^2 \]

5.2.3   Example

In the next example, you should find the mean and variance of the distribution by integration.

Question

What are the mean and variance of the \(\RectDistn(a, b)\) distribution?

(Solved in full version)

5.3   Random samples

5.3.1   Independence and random samples

The same definition of independence holds for both discrete and continuous random variables.

Definition

Two random variables, \(X\) and \(Y\), are independent if all events about the value of \(X\) are independent of all events about the value of \(Y\).

Independence of continuous random variables is usually deduced from the way that the variables are measured rather than from mathematical calculations. For example,

Characterisation of independence

For independent continuous random variables, \(X\) and \(Y\),

\[ \begin{align} P(x \lt X \lt x+\delta x &\textbf{ and } y \lt Y \lt y+\delta y) \\ &=\;\; P(x \lt X \lt x+\delta x) \times P(y \lt Y \lt y+\delta y) \\ &\approx\;\; f_X(x)\;f_Y(y) \times \delta x \; \delta y \end{align} \]

so

\[ P(X \approx x \textbf{ and } Y \approx y) \;\; \propto \;\; f_X(x)\;f_Y(y) \]

This is closely related to the corresponding result for two independent discrete random variables,

\[ P(X=x \textbf{ and } Y=y) \;\;=\;\; p_X(x) \times p_Y(y) \]

Random samples

A collection of \(n\) independent identically distributed random variables from the same distribution is called a random sample.

Extending our earlier characterisation of independence of two continuous random variables,

\[ P(X_1 \approx x_1, X_2 \approx x_2, ..., X_n \approx x_n) \;\; \propto \;\; \prod_{i=1}^n f(x_i) \]

This is again closely related to the corresponding formula for a random sample from a discrete distribution

\[ P(X_1 = x_1, X_2 = x_2, ..., X_n = x_n) \;\; = \;\; \prod_{i=1}^n p(x_i) \]

5.3.2   Distribution of sample sum and mean

The results that we showed earlier about sums and means of discrete random variables also hold for variables with continuous distributions. We simply repeat them here.

Linear combination of independent variables

If the means of two independent random variables, \(X\) and \(Y\), are \(\mu_X\) and \(\mu_Y\) and their variances are \(\sigma_X^2\) and \(\sigma_Y^2\), then the linear combination \((aX + bY)\) has mean and variance

\[ \begin {align} E[aX + bY] & = a\mu_X + b\mu_Y \\[0.4em] \Var(aX + bY) & = a^2\sigma_X^2 + b^2\sigma_Y^2 \end {align} \]

Sum of a random sample

If \(\{X_1, X_2, ..., X_n\}\) is a random sample of n values from any distribution with mean \(\mu\) and variance \(\sigma^2\), then the sum of the values has mean and variance

\[\begin{aligned} E\left[\sum_{i=1}^n {X_i}\right] & \;=\; n\mu \\ \Var\left(\sum_{i=1}^n {X_i}\right) & \;=\; n\sigma^2 \end{aligned} \]

Sample mean

If \(\{X_1, X_2, ..., X_n\}\) is a random sample of n values from any distribution with mean \(\mu\) and variance \(\sigma^2\), then the sample mean has a distribution with mean and variance

\[\begin{aligned} E\big[\overline{X}\big] & \;=\; \mu \\ \Var\big(\overline{X}\big) & \;=\; \frac {\sigma^2} n \end{aligned} \]

Central Limit Theorem (informal)

If \(\{X_1, X_2, ..., X_n\}\) is a random sample of n values from any distribution with mean \(\mu\) and variance \(\sigma^2\),

\[\begin{aligned} \sum_{i=1}^n {X_i} & \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \NormalDistn(n\mu, \;\;\sigma_{\Sigma X}^2=n\sigma^2) \\ \overline{X} & \;\; \xrightarrow[n \rightarrow \infty]{} \; \; \NormalDistn(\mu, \;\;\sigma_{\overline X}^2 = \frac {\sigma^2} n) \end{aligned} \]

5.4   Estimating parameters

5.4.1   Bias and standard error

Many continuous distributions have one or more parameters whose values are unknown. An unknown parameter, \(\theta\), is often estimated from a random sample of \(n\) values from the distribution,

\[ \hat{\theta} \;\; =\;\; \hat{\theta}(X_1, X_2, \dots, X_n) \]

As when estimating parameters of discrete distributions, the concepts of bias and standard error are important ways to differentiate a good estimator from a bad one. The definitions of these quantities are the same for both discrete and continuous distributions; we repeat them here.

Bias

The bias of an estimator \(\hat{\theta}\) of a parameter \(\theta\) is

\[ \Bias(\hat{\theta}) \;=\; E\big[\hat{\theta}\big] - \theta \]

If its bias is zero, \(\hat{\theta}\) is called an unbiased estimator of \(\theta\).

Standard error

The standard error of an estimator \(\hat{\theta}\) is its standard deviation.

Bias and standard error can again be combined into a single value.

Mean squared error

The mean squared error of an estimator \(\hat{\theta}\) of a parameter \(\theta\) is

\[ \MSE(\hat{\theta})\; =\; E\left[ (\hat{\theta} - \theta)^2 \right] \;=\; \Var(\hat{\theta}) + \Bias(\hat{\theta})^2 \]

A further characteristic of estimators also applies to continuous distributions.

Consistency

An estimator \(\hat{\theta}(X_1, X_2, \dots, X_n)\) is a consistent estimator of \(\theta\) if

\[ \begin{align} \Var(\hat{\theta}) \;\; &\xrightarrow[n \rightarrow \infty]{} \;\; 0 \\[0.5em] \Bias(\hat{\theta}) \;\; &\xrightarrow[n \rightarrow \infty]{} \;\; 0 \end{align} \]

5.4.2   Method of moments

A simple way to obtain an estimate of a single unknown parameter from a random sample is the method of moments. For both discrete and continuous distributions, it is the parameter value that makes the distribution's mean equal to that of the random sample and is therefore the solution to the equation

\[ E[X] \;\; = \; \; \overline{X} \]

German tank problem

Consider a rectangular distribution,

\[ X \;\; \sim \; \; \RectDistn(0, \beta) \]

where the upper limit, \(\beta\), is an unknown parameter. The distribution's mean and variance are

\[ E[X] \;\; = \; \; \frac {\beta} 2 \spaced{and} \Var(X) = \frac {\beta^2} {12} \]

so the method of moments estimator is

\[ \hat{\beta} \;\;=\;\; 2\overline{X}\]

It is unbiased and has standard error

\[ \se(\hat{\beta}) \;\;=\;\; \sqrt{\Var(2\overline{X})} \;\;=\;\; \sqrt{ \frac {4\Var(X)} n } \;\;=\;\; \frac {\beta} {\sqrt{3n}} \]

Despite being unbiased, this estimator has one major problem. From the random sample {12,17, 42, 97}, the resulting estimate of \(\beta\) would be

\[ \hat{\beta} \;\;=\;\; 2\overline{X} \;\;=\;\; 84\]

yet the maximum of the distribution cannot be 84 since we have already observed one value greater than this.

The method of moments usually gives reasonable parameter estimates, but can sometimes result in estimates that are not feasible.

5.4.3   Maximum likelihood

We defined the likelihood function of a discrete data set to be the probability of obtaining these data values, treated as a function of the unknown parameter, \(\theta\).

\[ L(\theta) \;=\; P(data \;| \; \theta) \]

If \(\{x_1, x_2, \dots, x_n\}\) is a random sample from a discrete distribution with probability function \(p(x \mid \theta)\), this is

\[ L(\theta) \;=\; P(X_1 = x_1, X_2 = x_2, ..., X_n = x_n \;| \; \theta) \;\;=\;\; \prod_{i=1}^n {p(x_i \;| \; \theta)} \]

For a random sample from a continuous distribution with probability density function \(f(x\;|\; \theta)\),

\[ P(X_1 \approx x_1, X_2 \approx x_2, ..., X_n \approx x_n) \;\; \propto \;\; \prod_{i=1}^n f(x_i) \]

so the product of the probability density functions plays the same role for continuous random variables as the product of probability functions for discrete ones.

Definition

If random variables \(\{X_1, X_2, \dots, X_n\}\) are a random sample from a continuous distribution with probability density function \(f(x \;|\; \theta)\), then the function

\[ L(\theta) = \prod_{i=1}^n {f(x_i \;| \; \theta)} \]

is called the likelihood function of \(\theta\).

Maximum likelihood estimate

The maximum likelihood estimate of \(\theta\) is again the value for which the observed data are most likely — the value that maximises \(L(\theta)\).

This is usually (but not always) a turning point of the likelihood function and can be found as the solution of the equation

\[ L'(\theta) \;\; =\;\; 0 \]

As with discrete distributions, it is usually easier to solve the equivalent equation involving the logarithm of likelihood function

\[ \ell'(\theta) \;\; =\;\; \frac d {d \theta} \log\big(L(\theta)\big) \;\; =\;\; 0 \]

5.4.4   Properties of maximum likelihood estimators

Maximum likelihood estimators have the same properties when used with continuous and discrete distributions. We repeat these properties, again in a slightly abbreviated form that is not mathematically rigorous.

Bias

The maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), that is based on a random sample of size \(n\) is asymptotically unbiased,

\[ E[\hat {\theta}] \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \theta \]

Asymptotic normality

The maximum likelihood estimator, \(\hat {\theta} \), of a parameter, \(\theta\), that is based on a random sample of size \(n\) asymptotically has a normal distribution,

\[ \hat {\theta} \;\; \xrightarrow[n \rightarrow \infty]{} \;\; \text{a normal distribution} \]

Approximate standard error

If \(\hat {\theta} \) is the maximum likelihood estimator of a parameter \(\theta\) based on a large random sample, its standard error can be approximated by:

\[ \se(\hat {\theta}) \;\;\approx\;\; \sqrt {- \frac 1 {\ell''(\hat {\theta})}} \]

From these, we can find the approximate bias (zero) and standard error of most maximum likelihood estimators based on large random samples.

5.4.5   Confidence intervals

If the estimator of any parameter, \(\theta\), is approximately unbiased and normally distributed, and if we can evaluate an approximate standard error, then the interval

\[ \hat{\theta}-1.96 \times \se(\hat {\theta}) \quad \text{ to } \quad \hat{\theta}+1.96 \times \se(\hat {\theta}) \]

has approximately probability 0.95 of including the true value of \(\theta\). The resulting interval is a 95% confidence interval for \(\theta\) and we have 95% confidence that it will include the actual value of \(\theta\).

This holds for random samples from both discrete and continuous distributions. In particular, it can be used for maximum likelihood estimators, due to their asymptotic properties.

Other confidence levels

Intervals with different confidence levels can be obtained by replacing "1.96" by other quantiles of the standard normal distribution. For example, a 90% confidence interval for \(\theta\) is

\[ \hat{\theta}-1.645 \times \se(\hat {\theta}) \quad \text{ to } \quad \hat{\theta}+1.645 \times \se(\hat {\theta}) \]

5.4.6   Example: normal distribution mean

This page applies maximum likelihood to a normal distribution.

Normal distribution with known σ

Consider a random sample,

4.2   5.2   5.6   6.1   7.3   8.5

from a normal distribution with known \(\sigma\),

\[ X \;\; \sim \; \; \NormalDistn(\mu, \;\sigma = 1.3) \]

Its log-likelihood is

\[ \ell(\mu) \;\;=\;\; \sum_{i=1}^n {\log(f(x_i \;|\; \mu))} \;\;=\;\; -\frac 1 {2 \times 1.3^2} \times \sum_{i=1}^n {(x_i-\mu)^2} + K \]

where \(K\) is a constant that does not depend on \(\mu\). To find the maximum likelihood estimate of \(\mu\), we solve

\[ \ell'(\mu) \;\;=\;\; \frac 1 {1.3^2} \times \sum_{i=1}^n {(x_i-\mu)} \;\;=\;\; 0 \]

Giving \(\displaystyle \hat{\mu} \;\;=\;\; \frac {\sum {x_i}} n \;\;=\;\; \overline{x}\).


We now illustrate the method graphically. The likelihood function, \(L(\mu)\), is the product of the normal distribution's pdf's at the data values — the product of the bar heights at the bottom of the next diagram.

The likelihood for the normal distribution with \(\mu = 4\) is low because the pdf is so small at the highest data values, f(7.3) and f(8.5). On the other hand, when \(\mu = \overline{x} = 6.15\), there are no small pdfs the likelihood function is maximised.

Standard error

We can directly find the standard error of the MLE using the properties of sample means,

\[ \se(\hat{\mu}) \;\;=\;\; \sqrt{\Var(\overline{X})} \;\;=\;\; \frac {\sigma} {\sqrt{n}} \;\;=\;\; \frac {1.3} {\sqrt n }\]

Finding the standard error from the second derivative of \(\ell(\mu)\) gives

\[ \se(\hat {\mu}) \;\;\approx\;\; \sqrt {- \frac 1 {\ell''(\hat {\mu})}} \;\;=\;\; \sqrt {\frac {1.3^2} n}\]

For this example, the asymptotic formula gives the exact standard error of the maximum likelihood estimator.

Confidence interval

For this example, the maximum likelihood estimator is the sample mean. Since sample means from normal distributions have exactly normal distributions,

\[ \overline{X} \;\; \sim \; \; \NormalDistn(\mu,\;\; \sigma_{\overline{X}} = \frac {1.3} {\sqrt n}) \]

the interval estimate

\[ \overline{x} \;\; \pm \; \; 1.96 \times \frac {1.3} {\sqrt n} \]

has exactly 95% confidence level. (The confidence level is only approximate for MLEs based on other distributions.)

5.4.7   Example: Rectangular maximum

Maximum likelihood estimates can usually be found as turning points of the likelihood function (or equivalently the log-likelihood function) — i.e. by solving \(\ell'(\theta) = 0\). However this method does not work in a few examples.

Rectangular distribution

The following six values,

0.12   0.32   0.36   0.51   0.63   0.69

are a random sample from a rectangular distribution,

\[ X \;\; \sim \; \; \RectDistn(0, \;\beta) \]

This distribution has likelihood is

\[ L(\beta) \;\;=\;\; \prod_{i=1}^6 {f(x_i \;|\; \beta)} \;\;=\;\; \begin{cases} \left(\dfrac 1 {\beta}\right)^6 &\text{for } \beta \ge \max(x_1, \dots, x_6) \\[0.4em] 0 &\text{otherwise} \end{cases} \]

This is illustrated below for a few values of \(\beta\). The red lines give the values of \(f(x \;|\; \beta)\) at the data points; their product gives the likelihood.

When \(\beta\) is less than the maximum data value, 0.690, the pdf at this value is zero, so the likelihood is zero. As \(\beta\) increases above 0.690, the pdfs for all data values decrease, and so does the likelihood. The likelihood function is shown below.

The maximum likelihood estimate is at a discontinuity in the likelihood function not at a turning point, so the MLE cannot be found solving \(\ell'(\beta) = 0\).

Bias and standard error

The 2nd derivative of the log-likelihood function is undefined at the MLE and cannot be used to obtain an approximate standard error. However formulae for its mean and standard deviation can be found from first principles — we will derive them later.

\[ E\left[\hat{\beta}\right] \;=\; \frac n {n+1} \beta \spaced{and} \se\left(\hat{\beta}\right) \;=\; \sqrt {\frac n {(n+1)^2(n+2)}}\times \beta \]

The estimator is therefore biased but is consistent since its bias and standard error both tend to zero as \(n \to \infty\).