Smooth curve to approximate a histogram
We have suggested that smoothness is a goal when drawing histograms, and especially those of large data sets. In this section, we explicitly try to obtain a smooth curve that approximates the shape of a histogram. Such a curve is called a probability density function.

Drawing a smooth curve by hand can be criticised for its lack of subjectivity -- two people might draw quite different curves. As a result, we prefer to use a more objective curve-fitting method based on a mathematical function. The challenge is to find a smooth curve that matches the data's histogram closely.
Why bother?
Sometimes we collect marks in order to gain a better understanding of that particular class of students. However we may also be interested in using the distribution of marks from one year in order to predict the likely marks from the same assessment activity for a different group of similar students — for example, the following year's class.
For a small data set, such as a single class set of 30 or fewer marks, there is a considerable degree of 'randomness' in the data and therefore in the shape of the resulting histogram. As a result, direct use of one year's histogram to predict the distribution of marks in the following year may be poor — the 'random' bumps in the shape are unlikely to be repeated in the same way.
A curve that smooths out irregularities in the histogram is likely to give a better guide to the expected distribution the next year.
The histogram below describes the distribution of marks in a test that is sat by a class of 30 students.
Click Sample a few times to see different histograms that might be observed from other classes of 30 similar students. In a data set of only 30 values, there is considerable 'randomness' in the shape of the histogram, so we really have very little idea of even whether to use a symmetric distribution to predict the following year's mark distribution based on a single sample.
As a result, a very simple smooth curve will give as good a prediction of the next year's distribution as other more complicated types of curve.