Finding percentiles

From any value, x, it is fairly easy to calculate the proportion of values in a data set that are x or lower — that is, the cumulative distribution function at x.

It is also possible to do the inverse operation. Given any proportion, p, between 0 and 1, we can find a value x such that approximately this proportion, p, of values is x or lower in our data set. This is called the p'th quantile in the data set. When p is given as a percentage, the same value is called the p'th percentile.

The p'th percentile is the value x such that p percent of the data set are x or lower.

Percentiles can be read from a graph of the cumulative distribution function — they are the x-values for which the height is p percent.

Annual rainfall in Dodoma

The diagram below again shows the cumulative distribution function for the annual rainfall in Dodoma, Tanzania.

Drag the horizontal red line up or down to read off different percentiles from the cumulative distribution function. Observe that:


Details (optional)

The following two points are mentioned for completeness but are not needed to understand the concept of percentiles.

Exact percentage
It may not be possible to find a value, x, such that exactly p percent of the data are lower, expecially if the sample size is not a multiple of 100. In the Dodoma rainfall data above, there are n = 78 values and the cumulative distribution function is therefore a step function that rises by 1/78 at each data value, so it is impossible to find an x-value for which exactly say 43% of values are lower.
Precise definition
There is no universally accepted general definition of percentiles and different statistical software give slightly different values. For example, in the Dodoma rainfall data, a proportion 0.423 of years have rainfall below 520.9 mm and a proportion 0.436 have rainfall below 521.0 mm. We have used 521 mm  as the 43rd percentile, but other software may report different values close to 521. The differences are minor and should not affect your interpretation of the data.