Numerical and categorical data

In a data set, a numerical variable contains a number from each individual. A categorical variable classifies each individual into one of several groups. For example, an investigation of the religions with which a group of 100 individuals identify might result in the 100 values,

catholic, anglican, atheist, anglican, muslim, ...

In many data sets, the values are not ordered in any meaningful way. For example, the 100 individuals above were not surveyed in any particular order. (If the data were collected in order, time series methods should be used to analyse them.) We only consider unordered categorical data in this chapter.

Frequency tables

An unordered numerical data set holds much detailed information about the distribution of values. (A dot plot shows full information about the distribution, though we may choose to summarise with a histogram or summary statistics.)

In contrast, an unordered categorical data set contains much less information. The frequencies for the distinct categories are the number of times each category occurs in the data set.

The frequencies fully capture all information about the distribution of values.

These frequencies are usually presented as a frequency table.

Student degrees

As part of a survey of students graduating at a university, 36 students were randomly selected from four degree programmes. For each graduating student, the class of degree was recorded (1st, 2nd or 3rd class). The 36 resulting categorical values are shown on the left of the diagram below.

To calculate the frequencies for each of the three classes of degree by hand, you would work through the table of values, drawing a line against the appropriate category name for each student (a tally). These tallies would finally be counted to give the frequencies.

Click on each of the categorical values in turn to illustrate how the tallies and frequencies are obtained.

The final table of frequencies on the right summarises the classes of degrees obtained by the sampled students. The frequency table contains all information about the distribution of degree classes.


Examining one variable from many

In surveys like the student degree survey above, several measurements are often recorded from each participant. Although in-depth analysis of the data would investigate the relationships between the variables, it is often useful to examine the distributions of the variables one-at-a-time.

Student degrees

In the student survey that was described above, five variables were measured from each student.

Frequency tables could be used to summarise the categorical variables whereas dot plots could summarise the distributions of the three numerical variables. The diagram below shows the data in tabular form and we will again build up the frequency distribution of the classes of degree.

Click on each row (student) in turn to build up the frequency table.