Ordering categories of ordinal and nominal variables
Some categorical variables have a natural ordering of their categories. These are called ordinal categorical variables. For example, many questionnaires request responses to statements on a five-point scale between 'strongly agree' and 'strongly disagree'. For such variables, the categories on a bar chart should be shown in this natural order.
When there is no natural ordering of the categories (a nominal categorical variable), the order of the categories in a frequency table or bar chart is arbitrary. For example, if school children are asked to pick their favourite subject, there is no natural way to order the subjects English, Mathematics and Music and these categories can be placed in any order on a bar chart.
Alphabetical ordering of the categories is rarely best.
Detecting 'important' categories
For nominal categorical variables, it is often useful to arrange the categories in decreasing order of their frequencies. When the bars of a bar chart are organised in this way, the diagram is called a Pareto diagram. The initial bars in the diagram have the highest frequencies and are often the most 'important' ones.
Pareto diagrams are particularly useful in industrial quality control and quality improvement where information is collected about the causes of problems in manufacturing processes. These causes are usually categorical and a Pareto diagram highlights the most important ones.
The Pareto diagram is named after an Italian economist in the late 1800's who found that about 80 percent of the wealth of a region was concentrated in less than 20 percent of the population. This rule-of-thumb has been adapted to quality improvement, giving the Pareto principle that
A large percentage of instances of any problem result from a small percentage of the possible causes.
A line is usually added to a Pareto diagram showing the cumulative proportions for the different causes. For the i'th cause, the height of the line gives the proportion of problems from any of the i most common causes.
Defective cereal boxes
A manufacturer of breakfast cereals has received complaints about defective boxes of corn flakes being shipped to supermarkets. The output from one week was checked for defects and the following table shows the main reasons for boxes being rejected as defective.
Reason for defective box | Number of boxes | ||
---|---|---|---|
|
|
||
Total | 74 |
The bar chart below shows the data graphically
There is no natural ordering of the defects, so we can reorder them in any way. Select Decreasing frequencies from the pop-up menu. After reordering, the most important reasons for the defective boxes are on the left and the least important are at the right.
Cumulative proportions
The diagram below completes the Pareto diagram with the cumulative proportions.
Click on the bar for Dirty to stack the bars for the three most common causes. The cumulative proportion line goes through the top of this stack, so it shows the proportion of boxes that were rejected for these three causes. Click on other bars to read off other cumulative proportions.
Finally, click the checkbox Separate scale for cumulative propns to expand the scaling of the individual bars of the bar chart and therefore make comparisons easier. Note that a different scale is used for the cumulative proportions (on the right) and the individual proportions (on the left).