Proportions within groups

Although a contingency table fully describes categorical data from two or more groups, it is a poor way to compare the distributions if there are different total numbers in the groups.

Rather than tabulating the frequencies for each group, it is more informative to tabulate the proportions within the groups. Each frequency in the table is therefore divided by the total for that group.

For example, in the drug-screening example on the previous page, 94 smokers tested positive for marijuana but 103 non-smokers tested positive. However since there were many more non-smokers than smokers, it is more meaningful to report that a proportion 94/825 = 0.114 of the smokers tested positive whereas only 103/1708 = 0.060 of the non-smokers were positive.

Heating fuel in buildings

The Cincinnati Gas and Electric Company conducted a survey of commercial buildings in 1992. The contingency table below describes the main heating fuel used in buildings of different ages.

Differences between buildings of different ages are clearer if the proportions using each fuel are displayed within each age group. These proportions are found by dividing each row of the table by its row total — click on any row to see the process.

Select the option Propn within Year of construction from the pop-up menu to display the resulting proportions. This scales each row, making all row totals the same, 1.0.

Scan down the columns of this table to make comparisons of the different building ages. Observe that

Multiplying the proportions by 100 rewrites them as percentages. Select Percent within Year of construction to display these percentages. Although percentages and proportions contain the same information, the leading zeros and decimal points are absent in the percentages and this 'cleaner' display makes it easier to compare the years.

Bar charts of proportions

Bar charts provide a graphical way to compare groups. Although the bar chart of each group has the same shape whether it is based on frequencies or proportions, comparisons are made more easily if proportions are used, especially when the groups are of different sizes.

The diagram below shows the fuel use data.

From bar charts of the counts, the large number of buildings constructed in 1973 or earlier that are using natural gas for heating is evident. But how much is that due to the larger number of old buildings in the survey?

Select Propn within Year of construction or Percent within Year of construction from the pop-up menu. The effect is to scale each bar chart to have the same total (1.0 or 100). Changes to the proportion using natural gas are relatively small — the increase in the proportion using electricity now stands out.

Clustering the bars

If the groups correspond to different rows of a table that shows proportions within groups (so the row totals are 1.0), the most important comparisons are down columns. For example, we would scan down the 'Crack' column in the table above to compare the proportions convicted of dealing with that drug in the different groups.

When separate bar charts are drawn for the different groups, the corresponding bars are widely separated in the diagram, making comparisons harder. An alternative display uses the same bars, but clusters them by the values of the categorical variable, rather than by groups. This type of clustered bar chart makes it easier to spot subtle differences between the groups.

Where do nurses work?

Colleges that train nurses need to know the types of work that the nurses will eventually perform, in order to give them appropriate training. One aspect of this is the mix of work settings that will eventually employ these nurses.

The diagram below shows the work settings of all enrolled nurses in Australia in 1993, 1996 and 1999.

Although the distribution of workplaces within each year is clearly shown in this diagram, it is harder to assess any trends over the six-year period since all bar charts have a similar shape.

Select the option Workplace from the pop-up menu to cluster the bars by workplace. From this diagram it is easier to see the more subtle changes in distribution over the period.