Finding percentiles

From any value, x, it is fairly easy to calculate the proportion of values in a data set that are x or lower — that is, the cumulative distribution function at x.

It is also possible to do the inverse operation. Given any proportion, p, between 0 and 1, we can find a value x such that approximately this proportion, p, of values is x or lower in our data set. This is called the p'th quantile in the data set. When p is given as a percentage, the same value is called the p'th percentile.

The p'th percentile is the value x such that p percent of the data set are x or lower.

Percentiles can be read from a graph of the cumulative distribution function — they are the x-values for which the height is p percent.

Annual rainfall in Samaru

The diagram below again shows the cumulative distribution function for the annual rainfall in Samaru, Nigeria.

Drag the horizontal red line up or down to read off different percentiles from the cumulative distribution function. Observe that:


Details (optional)

The following two points are mentioned for completeness but are not needed to understand the concept of percentiles.

Exact percentage
It may not be possible to find a value, x, such that exactly p percent of the data are lower, expecially if the sample size is not a multiple of 100. In the Samaru rainfall data above, there are n = 56 values and the cumulative distribution function is therefore a step function that rises by 1/56 at each data value, so it is impossible to find an x-value for which exactly say 43% of values are lower.
Precise definition
There is no universally accepted general definition of percentiles and different statistical software give slightly different values. For example, in the Samaru rainfall data, a proportion 0.429 of years have rainfall below 1019 mm and a proportion 0.446 have rainfall below 1020 mm. We have used 1019 mm  as the 43rd percentile, but other software may report different values close to 1019. The differences are minor and should not affect your interpretation of the data.