We explained earlier how to generate categorical pseudo-random values, based on the probabilities for the different categories. In this page, we show how to generate pseudo-random numerical values from a distribution such as a normal distribution.

Discrete introduction

Consider a discrete distribution with the following probabilities.

Value, x Probability, px
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
0.01
0.06
0.10
0.12
0.12
0.11
0.10
0.08
0.06
0.05
 
Value, x Probability, px
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0.01
0.01
0.01

The diagram below shows a histogram for the distribution, with each square representing a probability of 0.01. The squares are numbered (left to right) from 1 to 100; click on any box to display its index and the corresponding value.

A random value from the distribution is found by randomly selecting one of the squares in the histogram (i.e. by generating an index between 1 and 100 with equal probability for each index). The generated value is the one associated with the square in the histogram. Click Next Value a few times to randomly generate indices and hence values.

Hold down the button Next Value until 100 or more values have been generated. Observe from the stacked dot plots that the distribution of the index is uniform between 0 and 100, but the distribution of the simulated values conforms to the underlying model.

Cumulative distribution function

Pseudo-random numbers can be generated from a random variable's cumulative distribution function

The next diagram simulates values from the distribution in the same way, but displays the simulation mechanism differently. The vertical axis shows the indices for the histogram cells (between 1 and 100) and the stepped black line shows how the indices correspond to values. Note that the height of each step equals the number of histogram cells at that value of x.

Click Next Value a few times and observe how the indices (which are uniformly distributed on the vertical axis) are mapped by the stepped line to simulated values from the distribution.


The vertical axis has been labelled 'Index', but the height of the stepped line at any value x is also equal to the number of histogram cells corresponding to values less than or equal to x. For example, the height at x = 1.04 is 41 which is the number of histogram cells corresponding to values 0.6, 0.7, ..., 1.0.

If the labels on the vertical axis are divided by the total number of histogram cells (100 here), the stepped line gives the probability of a value of x or less from the distribution. This is called the cumulative distribution function of the distribution and is often written as F(x).

Generating values from a continuous distribution

This mechanism can be extended to generate random values from a continuous numerical distribution. It is also based on the cumulative distribution function for the distribution, F(x).

  1. Generate a value, p, that is rectangularly distributed between 0 and 1. (A pseudo-random number generated by a congruential random number generator will do.)
  2. Find the value x for which F(x) = p. Mathematically, this is expressed as x  = F-1(p). The value x is a pseudo-random number from the target distribution.

The diagram below illustrates this method for generating values from a normal distribution with mean 2.0 and standard deviation 0.5.

Click Next Value a few times to see how a simulated value from a rectangular distribution leads to a simulated value from the target normal distribution.

(If you are uncertain about this diagram, look at the previous diagram again — the concepts are the same in both.)

Click Accumulate then generate a few more values. The jittered dot plot on the vertical axis shows the rectangularly distributed probabilities, whereas the jittered dot plot on the horizontal axis shows that the distribution of the generated values is indeed the target normal distribution.

Generating random numbers in Excel

Excel contains a built-in function RAND() that generates a pseudo-random value that is rectangularly distributed between 0 and 1. It also contains built-in functions to evaluate the cumulative distribution functions and their inverses for several standard distributions. For example, the function NORMINV() finds the inverse of a normal distribution's cumulative distribution function.

To obtain a pseudo-random value from a normal distribution with mean 2.0 and standard deviation 0.5, you can therefore type into a spreadsheet cell the formula

=NORMINV(RAND(), 2.0, 0.5)

Copying the formula down a spreadsheet column allows you to generate a random sample of values from this normal distribution.