A frequency table holds all information about the distribution of a categorical variable.
The proportions or percentages of values are usually easier to interpret than the raw frequencies.
In a frequency table, each 'individual' contributes a count (frequency) of 1 to exactly one table entry. Many published tables of counts and percentages are not proper frequency tables.
The information in a frequency table may be clearer if the categories are re-ordered, associated categories are combined, or if the table only describes a subset of the categories.
A bar chart displays the frequencies in a frequency table graphically.
For barcharts of categorical data whose categories have no natural ordering, it is often helpful to sort the categories into decreasing frequencies.
The temptation to embellish simple barcharts should be resisted. Some such 'artistic' embellishments are misleading.
These are alternatives to a bar chart for categorical data.
Bar charts distinguish better the proportions in single categories. Pie charts are better for showing the proportion in a group of categories.
Three-dimensional pie charts are often drawn. If a pie chart holds little information, it is better to draw it small than to enhance it in this way.
Bar and pie charts can be used to display any data where a total quantity is split into parts.
Categorical data collected from different groups can be shown in a contingency table. It contains a simple frequency table for each group.
Contingency tables sometimes arise from experiments. Data collected from surveys are often presented in contingency tables.
It is easier to compare groups if bar charts are drawn from the proportions within each group rather than the frequencies.
Stacking the bars in each bar chart makes it easier to compare groups. Stacked bar charts are particularly effective for ordinal categorical data.
When the variable has only two categories, stacked bar charts can be simplified. An example is also given where the grouping forms a time series.
If two categorical variables can be treated as a response and explanatory variable, the explanatory variable can be used to split the individuals into groups. If neither variable can be treated as explanatory, we need other ways to describe their association.
Bivariate distributions can be displayed in 3-dimensional bar charts but there are better ways to show the data.
A 2-dimensional bar chart can be used to display the joint distribution of two categorical variables, with the bars clustered by one or other of the variables.
The two marginal distributions are found by adding the joint frequencies (or proportions) across rows or down columns of a contingency table.
Conditional distributions are obtained by scaling the rows (or columns) of a contingency table to make them all sum to 1.0.
Be careful to distinguish between the conditional distribution of Y given X, and that of X given Y.
Proportional Venn diagrams are a useful way to display joint distributions and to understand the relationships between conditional and marginal distributions.
Never use gridlines to box all values in a table. In large multi-column tables, reading across rows is easier with occasional hairlines of light shading, but otherwise consider using white space to separate associated groups of rows or columns.
Use white space to group related rows and columns. Rearranging rows or columns may bring values that should be compared closer. Summarise and interpret in the body of a report but do not simply repeat values.
The meaningful information is 'signal'. Information that does not help understanding of the data is 'noise'. Noise includes data noise and unnecessary embellishments to the table. Decreasing the significant digits displayed often decreases data noise.
Showing proportions in a multi-column table instead of frequencies makes it easier to compare groups. Ratios of variables can be easier to interpret than their raw values.
It is easier to compare values down rows than across columns. Interchanging the rows and columns of a table can make it easier to make comparisons.
Rearranging the rows (or columns) may make the information in large tables stand out better.
An example shows a published table whose presentation can be improved in many ways.
With a categorical response and numerical explanatory variable, stacked bar charts at each X are an effective display.
Using a straight line to describe how the proportion in a category depends on X is not appropriate. A curve is required.
A 'logistic' curve can be used to model how a proportion depends on X.
A logistic curve is fitted to an example data set.