Most variables in a data set can be classified into one of two major types.

Numerical variables

The values of a numerical variable are numbers. They can be further classified into discrete and continuous variables.

Discrete numerical variable
A variable whose values are whole numbers (counts) is called discrete. For example, the number of days with rain in a year is discrete.
Continuous numerical variable
A variable that may contain any value within some range is called continuous. For example the total annual rainfall is continuous.

Statistical methods that can be used for continuous variables are not always appropriate for discrete variables.

The distinction between discrete and continuous variables is important.

A discrete variable can usually be identified as being a count, whereas a continuous variable is some other kind of measurement.

Categorical variables

The values of a categorical variable are selected from a small group of categories. Examples are gender (male or female) marital status (never married, married, divorced or widowed) and weather pattern in year (El Nino, La Nina or Ordinary).

Categorical variables can be further categorised into ordinal and nominal variables.

Ordinal categorical variable
A categorical variable whose categories can be meaningfully ordered is called ordinal. For example, a student's grade in an exam (A, B, C or Fail) is ordinal.
Nominal categorical variable
It does not matter which way the categories are ordered in tabular or graphical displays of the data — all orderings are equally meaningful. For example, a student's religion (Atheist, Christian, Muslim, Hindu, ...) is nominal.

Most statistical methods for categorical data can be applied to both ordinal and nominal variables.

We rarely distinguish between ordinal and nominal variables in CAST.

Labels

In some data sets, each individual has a unique 'name' that can be used to identify it. We call such a variable a label variable. The labels may help us to identify unusual observations in the data set.


Warning!

Sometimes categorical variables are coded as numbers when the data are recorded (e.g. gender may be coded as 0 for males and 1 for females). The variable is still categorical, despite the use of numbers.

In a similar way, the individuals in a survey may be coded with a number that uniquely identifies them (perhaps to avoid storing names in the data for confidentiality). This is really a label variable and may be simply the row number in the data matrix.

When you see a column of numbers in your data matrix, do not assume that it is a numerical variable.


Characteristics of hospital patients

Consider the following data set that describes characteristics of a group of patients with prostate cancer on admission to hospital.

Name Age Marital status PSA in blood Smoking Tumor size
John Smith 54 married 45 never smoked small
Mark Brown 65 widowed 37 current smoker medium
Adam Jones 52 divorced 66 former smoker medium
Stuart Robertson 71 married 97 never smoked large
... ... ...   ... ...

African countries

The diagram below shows some data about African countries.

European European power in control of the country in 1945
Calories Calories per capita per day in 1998
Life expectancy Male life expectancy in 2003
AIDS/HIV Percentage of adults (15-49) with AIDS/HIV in 2003

The first of these variables is a nominal categorical variable and the others are continuous numerical ones. A map of Africa is coloured to represent the values of the variables.

Use the pop-up menu to select the variable to display on the map and investigate its distribution through Africa.

Click on a row of the data matrix or a country on the map to highlight it in both parts of the diagram.

Note that the some values are unknown (shaded in grey on the map).