Columns in a vector spreadsheet can be of three types, variate, text or factor. A variate is a numeric column, while text can contain letters or a mixture of letters and numbers.
A factor column provides a classification into a number of distinct groups or classes. Two individuals with the same factor level belong to the same group. A factor is sometimes called a “categorical” or “class” variable within other statistical applications.
The columns Sex, Age, Severity and Treatment in the following spreadsheet are factors. The factor Sex has two levels with the labels ‘Female’ and ‘Male’. The factor Age has 3 levels, 10, 11 and 12. The factor Severity has four levels 0, 1, 2, 3, 4 with the labels ‘None’, ‘Mild’, ‘Moderate’ and ‘Acute’. The factor Treatment has four levels 0, 1, 2, 10 with the labels ‘Control’, ‘Normal’, ‘Medium’ and ‘High’.
Although age is a continuous measurement (as we continually grow older), the individuals have been assigned to a discrete age group based on their age at their last birthday. Severity is an assessment of the degree of a disease in an individual, and a doctor will have assigned a level to each person, based on his clinical judgement. The levels of severity have a natural ordering, with 0 as an obvious starting value, so the numerical levels 0-4 have been assigned to these, but labels have also been assigned to describe what each level means. The factor Treatment is the intended amount of a drug that has been prescribed to each individual, so has a natural numerical value (dose in milligrams), but also a label to give extra information on these doses. In an experimental trial, a small set of fixed values is often used for a treatment factor that could vary continuously (like the amount of drug given in this case), so that a number of replicates of the levels used occur. The variation between the replicates is then used to assess the consistency of the response to the treatment.
Details of the Severity factor are shown below, using the Edit Factor Levels and Labels dialog. This shows that any factor column has three components, that Genstat calls Ordinals, Levels and Labels. The ordinals are numbered from 1 upwards, and they dictate the order that the levels of a factor will be displayed. If both levels and labels are present, you have a choice of which is displayed in the spreadsheet column. The column Severity has labels displayed, whilst Treatment has levels displayed. You can also choose to display the ordinals of a factor – this is the default if no levels or labels have been assigned to the factor.
Each group can be assigned some text, called a label, to allow clearer identification of the group. Each group can also have a numerical value (also called a level) assigned to it. A factor can have both levels and labels defined at the same time. The levels are distinct from the ordinals, and the factor’s levels need not be in numerical order. If no user-defined levels are given for a factor, then the default levels are the same as the ordinal values of the factor. The availability of ordinals, levels and labels for factors gives you great flexibility in how you present results that are in tables and graphs. For example, the order of rows in a table is always dictated by the ordinals, but what you will see is either the levels, or the labels. When doing a numerical calculation with a factor, the values of the factor levels are used. When sorting on a factor, the values of the ordinals are used.
A factor is indicated in a spreadsheet by a red exclamation mark (!) at the start of the column name. A column imported from another data source, such as Excel, can be marked as containing grouping information by ending the column name with an exclamation mark (e.g. Chemical!).
The factor, Treatment, shown in the Edit Factor Levels and Labels dialog below, has 4 groups each with both a numerical level and a label.
The groups in a factor can be reordered: initially the groups in Treatment have been sorted numerically on the levels. The groups in the Treatment could be sorted alphabetically (as shown below), and the joint membership of individuals to a particular group would remain the same. However, the order that the groups would appear in an analysis or table would change.
The ordinals are always numbered 1,2,3… even when the factor levels have reordered. When importing data from other file formats, Genstat will by default order the levels of any factor either numerically or alphabetically or by the order they are first met in the data. This behaviour is controllable via the spreadsheet options or in the dialogs used to load data. The Spreadsheet Factor Menu has a number of menus to allow factor levels and labels to be manipulated.
The maximum length of a factor label is 60 characters.
As spreadsheet packages do not have a factor cell type (they only have numeric or text cells), Genstat lets you specify a factor in one of these files by appending a ! to the end of the column name (e.g. Sex! or Age!).
Also Excel does not allow both numerical and text values in one cell. To enable both to be specified for a factor (as in Severity or Treatment), you can add a comment to the column name as described below.
The factor labels or levels must start with a ! on a new line in the comment. If you are providing levels use the format !(100,90,50,10). If you are providing labels use the format !T(Control,A,B,C) or !t(‘Control’,’A’,’B’,’C’). The order of the items in the comment will define the order of the levels or labels in the factor. If a column in Excel just contains ordinal values (i.e. 1…n), the comment can still be used to assign labels or levels to these groups. The first item in the comment will define the level or label for group 1 in the factor etc.
The column description information can also be given in the comment, as a line not starting with an exclamation mark.
The following image shows two comments entered into Excel (use the Excel menu item Insert | Comment to do this, making sure you don’t put in line breaks in the middle of the list).
A factor must be used in many places (e.g. ANOVA) where group membership is specified. Variates and texts can be converted to a factor and back again (most easily with the right mouse menu in the spreadsheet) with the Convert dialog.