Forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.
|Printed output required (
||Number of groups to form when
||Defines how to form the levels variate if the setting of the
||Number of decimal places to which to round the
||Whether to interpret the
||Whether to allow a structure in the
||Whether the case of letters (small and capital) in text should be regarded as significant or ignored (
||How to define the levels (for a variate
||Whether to omit the (unbounded) group that occurs below the lowest limit when
||Vectors whose values are to define the groups|
||Structures to be defined as factors to save details of the groups; default
||Limits to define the groups|
||Variate to define the levels of each
||Text to define the labels of each
GROUPS directive is designed to form factors from variates or texts. The variates and texts are specified by the
VECTOR parameter, and the factors by the
FACTOR parameter. With the simplest use of
GROUPS you need specify no more than that, and each factor is defined to have a level for every distinct value of its corresponding variate or text. You need not have declared the factor already; it will be declared automatically if necessary.
Alternatively, you can divide the values of the variate or text into groups to be represented by the factor. You can use the
LIMITS parameter to specify the range of values for each group. The limits vector is a text or a variate, depending whether the factor is being defined from a variate or a text; its values specify boundaries for the ranges. The
BOUNDARIES option controls whether these are regarded as upper or lower boundaries; by default
BOUNDARIES=lower. You can also ask
GROUPS itself to set limits that will partition the units into groups of nearly equal size. You should then specify the
NGROUPS option and leave the
LIMITS parameter unset. (If you give both
NGROUPS is ignored.)
If you are defining a factor from a variate
LMETHOD option controls how the levels vector is formed, with the following settings:
||forms the levels from the median of the units in each group (default);|
||forms them from the minimum value in each group;|
||form them from the maximum value;|
||uses the values in the
||uses the values supplied (in a variate) by the
With any of the settings
limit, you can use the
LEVELS parameter to specify a variate to store the levels that are produced; this can be done even if no factor is being formed, that is if no identifier is supplied for the factor by the
FACTOR list. Finally, if you set
LMETHOD=*, no levels are formed and any existing levels of the factor will be retained if they are still appropriate; otherwise the levels will be the integers 1 upwards. With any of these settings, you can use the
LABELS parameter to specify labels for the factor.
Similar rules apply if you have a text
VECTOR except that
LMETHOD then governs how the labels are defined for the factor, and
LEVELS can be used to specify its levels. The
CASE option controls whether the case of the letters in the text strings is important. So, for example, if you set
CASE=ignored the strings
'april' will be put into the same group. With the default,
CASE=significant, they would form different groups.
When the levels are formed from a
LIMITS variate, there will be one group with no corresponding limit. If
BOUNDARIES=upper, the extra group is above the final limit. The level assigned to that group is then the value that is the same distance above the final limit as the distance between the final limit and the last but one limit. If
BOUNDARIES=lower, the extra group is below the first limit, and its level is given the value that is the same distance below the first limit as the distance between the first and second limits. The situation is similar with a
LIMITS text, but the label for the extra group is always the single-character string
'-'. If you would prefer to have an exact correspondence between the level and the limits, you can set option
OMITUNBOUNDED=yes to omit the “unbounded” extra group. Any units beyond the final upper limit, or below the initial lower limit, are then given missing values.
LDIRECTION option controls the ordering of the levels (for a variate
VECTOR) or the labels (for a text
LMETHOD is set to
maximum. By default, they are sorted into ascending order, but you can set
LDIRECTION=given to take them in the order in which they occur in the
VECTOR. This may be useful, for example, if a text vector contains the names of days or of months in calendar order.
You can set the
DECIMALS option to request that the values of a variate
VECTOR be rounded to a particular number of decimal places before the groups are formed: for example
DECIMALS=0 would round each value to the nearest integer.
You can redefine a
VECTOR structure as a factor by setting option
REDEFINE=yes and omitting to specify any corresponding identifier in the
FACTOR list. This can be very useful on occasions when you are unable to define in advance which levels will occur in a set of data.
summary to print a summary of the contents of the
FACTOR (numbers of values, missing values and levels).
GROUPS takes account of any restrictions on variates or texts in the
VECTOR list, and will give missing values to the excluded units. If more than one vector is restricted, then each of their restrictions must be the same.
Commands for: Calculations and manipulation.
" Example GROU-1: Use of the GROUPS directive" VARIATE [VALUES=21,50,24,49,29,42,32,42,36,40] A & [VALUES=3000,17500,5000,20000,7000,4500,12000,18000,15500,17500] I TEXT [VALUES=Clarke,Irving,Adams,Jones,Day,Good,Edwards,Baker,Hall,Field] N FACTOR [LABELS=!T(male,female); VALUES=2,1,1,1,2,2,1,1,2,1] S " put ages into a factor Agef, with a level for each distinct age " GROUPS [PRINT=summary; LMETHOD=*] A; FACTOR=Agef PRINT A,Agef " form a factor Inclevel from variate I, according to 5000 (pound) levels " GROUPS [LMETHOD=*] I; FACTOR=Inclevel; LIMITS=!(5000,10000,15000,20000) PRINT I,Inclevel " form a factor to define 3 (nearly) equal sized income groups; set levels to median group values " GROUPS [NGROUP=3] I; FACTOR=Incgroup PRINT I,Incgroup