Most variables in a data set can be classified into one of two major types.
Numerical variables
The values of a numerical variable are numbers. They can be further classified into discrete and continuous variables.
Statistical methods that can be used for continuous variables are not always appropriate for discrete variables.
The distinction between discrete and continuous variables is important.
A discrete variable can usually be identified as being a count, whereas a continuous variable is some other kind of measurement.
Categorical variables
The values of a categorical variable are selected from a small group of categories. Examples are gender (male or female) and marital status (never married, married, divorced or widowed).
Categorical variables can be further categorised into ordinal and nominal variables.
Most statistical methods for categorical data can be applied to both ordinal and nominal variables.
We rarely distinguish between ordinal and nominal variables in CAST.
Labels
In some data sets, each individual has a unique 'name' that can be used to identify it. We call such a variable a label variable. The labels may help us to identify unusual observations in the data set.
Warning!
Sometimes categorical variables are coded as numbers when the data are recorded (e.g. gender may be coded as 0 for males and 1 for females). The variable is still categorical, despite the use of numbers.
In a similar way, the individuals in a survey may be coded with a number that uniquely identifies them (perhaps to avoid storing names in the data for confidentiality). This is really a label variable and may be simply the row number in the data matrix.
When you see a column of numbers in your data matrix, do not assume that it is a numerical variable.
Characteristics of employees
Consider the following data set that describes characteristics of the employees of a company.
Name | Sex | Age | Marital status | No of children | Income | Smoking |
---|---|---|---|---|---|---|
John Smith | male | 24 | single | 0 | $25,000 | never smoked |
Mary Brown | female | 35 | married | 3 | $45,000 | current smoker |
Adam Jones | male | 42 | divorced | 1 | $40,000 | former smoker |
Jane Robertson | female | 29 | divorced | 0 | $42,000 | never smoked |
... | ... | ... | ... | ... | ... |
European countries
The diagram below shows some data about countries in Europe.
Membership of EU | Distinguishes between countries that joined the EU before 2000, those that joined from 2000 to 2004, between 2005 and 2014, were candidates in 2014, and others. |
---|---|
Lifetime | Male life expectancy in 2011. |
Alcohol | Alcohol consumption per person over 15 in 2005, measured in litres of pure alcohol. |
Internet | Internet users per 100 in 2012. |
Energy | Energy use (kg of oil equivalent per capita) in 2011. |
The data were obtained from the World Bank (http://www.worldbank.org/data) and the World Health Organisation (http://www3.who.int/whosis/menu.cfm).
The first of these variables is a nominal categorical variable and the others are continuous numerical ones. A map of Europe is coloured to represent the values of the variables.
Use the pop-up menu to select the variable to display on the map and investigate its distribution through Europe.
Click on a row of the data matrix or a country on the map to highlight it in both parts of the diagram.
Note that the some values are unknown (shaded in grey on the map).