Histograms of data in each group

When data are collected from two groups, a histogram can be used to graphically display the distribution of values in each group.

Calcium content of salt

In the manufacture of chlorine, large quantities of raw salt containing impurities and various trace elements such as calcium and magnesium are dissolved in water to create a brine solution. The diagram below shows calcium content (ppm) in salt samples from two different supply sources. The crosses have been jittered a little (randomly moved) to separate them in the scatterplot.

This diagram is 3-dimensional. Position the mouse in the middle of the diagram and drag towards the top left of the screen to rotate the plot (or click the 3D rotation button). The histogram within each group describes the distribution of calcium contents of salt from the two sources.

Model for each group

A single batch of numerical values is usually modelled as a random sample from some population — often a normal distribution. In a similar way, data sets that consist of measurements from two groups are often modelled as two independent random samples from two underlying hypothetical infinite populations. Normal distributions are again commonly used as models.

(The assumption of normality should be checked from graphical displays of the sample data. If the data are noticeably skew, a transformation may provide values that can be adequately modelled by normal distributions.)

Calcium content of salt

The histograms of calcium content for salt from the two sources seemed fairly symmetrical, so normal distributions are reasonable models within the two groups. The diagram below shows a possible model for the salt data.

Click Take sample to select a random sample from each of the two normal distributions. The model claims that the real data set consists of random samples from distributions like these.