Forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.
Options
PRINT = string token |
Printed output required (summary ); default * i.e. no printing |
---|---|
NGROUPS = scalar |
Number of groups to form when LIMITS is not specified; if NGROUPS is also unspecified, each distinct value (allowing for rounding) defines a group; default * |
LMETHOD = string token |
Defines how to form the levels variate if the setting of the VECTOR parameter is a variate, or the labels if it is a text; if LMETHOD=* no levels/labels are formed, and existing levels (for a variate VECTOR ) or labels (for a text VECTOR ) of an already declared FACTOR will be retained if still appropriate (given , minimum , median , maximum , limit ); default medi |
DECIMALS = scalar |
Number of decimal places to which to round the VECTOR before forming the groups; default * i.e. no rounding |
BOUNDARIES = string token |
Whether to interpret the LIMITS as upper or lower boundaries (upper , lower ); default lowe |
REDEFINE = string token |
Whether to allow a structure in the FACTOR list that has already been declared (e.g. as a variate or text) to be redefined (yes , no ); default no |
CASE = string token |
Whether the case of letters (small and capital) in text should be regarded as significant or ignored (significant, ignored ); default sign |
LDIRECTION = string token |
How to define the levels (for a variate VECTOR ) or labels (for a text VECTOR ) when LMETHOD = minimum , median or maximum (ascending , given ); default asce |
OMITUNBOUNDED = string token |
Whether to omit the (unbounded) group that occurs below the lowest limit when BOUNDARIES=lower , or above the final limit when BOUNDARIES=upper (yes , no ); default no |
Parameters
VECTOR = variates or texts |
Vectors whose values are to define the groups |
---|---|
FACTOR = factors |
Structures to be defined as factors to save details of the groups; default * will, if REDEFINE=yes , cause the corresponding VECTOR itself to be defined as a factor |
LIMITS = variates or texts |
Limits to define the groups |
LEVELS = variates |
Variate to define the levels of each FACTOR if LMETHOD=give , or to save them otherwise |
LABELS = texts |
Text to define the labels of each FACTOR if LMETHOD=give , or to save them otherwise |
Description
The GROUPS
directive is designed to form factors from variates or texts. The variates and texts are specified by the VECTOR
parameter, and the factors by the FACTOR
parameter. With the simplest use of GROUPS
you need specify no more than that, and each factor is defined to have a level for every distinct value of its corresponding variate or text. You need not have declared the factor already; it will be declared automatically if necessary.
Alternatively, you can divide the values of the variate or text into groups to be represented by the factor. You can use the LIMITS
parameter to specify the range of values for each group. The limits vector is a text or a variate, depending whether the factor is being defined from a variate or a text; its values specify boundaries for the ranges. The BOUNDARIES
option controls whether these are regarded as upper or lower boundaries; by default BOUNDARIES=lower
. You can also ask GROUPS
itself to set limits that will partition the units into groups of nearly equal size. You should then specify the NGROUPS
option and leave the LIMITS
parameter unset. (If you give both LIMITS
and NGROUPS
, then NGROUPS
is ignored.)
If you are defining a factor from a variate VECTOR
, the LMETHOD
option controls how the levels vector is formed, with the following settings:
median |
forms the levels from the median of the units in each group (default); |
---|---|
minimum |
forms them from the minimum value in each group; |
maximum |
form them from the maximum value; |
limit |
uses the values in the LIMITS variate; |
given |
uses the values supplied (in a variate) by the LEVELS parameter. |
With any of the settings median
, minumum
, maximum
or limit
, you can use the LEVELS
parameter to specify a variate to store the levels that are produced; this can be done even if no factor is being formed, that is if no identifier is supplied for the factor by the FACTOR
list. Finally, if you set LMETHOD=*
, no levels are formed and any existing levels of the factor will be retained if they are still appropriate; otherwise the levels will be the integers 1 upwards. With any of these settings, you can use the LABELS
parameter to specify labels for the factor.
Similar rules apply if you have a text VECTOR
except that LMETHOD
then governs how the labels are defined for the factor, and LEVELS
can be used to specify its levels. The CASE
option controls whether the case of the letters in the text strings is important. So, for example, if you set CASE=ignored
the strings 'April'
and 'april'
will be put into the same group. With the default, CASE=significant
, they would form different groups.
When the levels are formed from a LIMITS
variate, there will be one group with no corresponding limit. If BOUNDARIES=upper
, the extra group is above the final limit. The level assigned to that group is then the value that is the same distance above the final limit as the distance between the final limit and the last but one limit. If BOUNDARIES=lower
, the extra group is below the first limit, and its level is given the value that is the same distance below the first limit as the distance between the first and second limits. The situation is similar with a LIMITS
text, but the label for the extra group is always the single-character string '-'
. If you would prefer to have an exact correspondence between the level and the limits, you can set option OMITUNBOUNDED=yes
to omit the “unbounded” extra group. Any units beyond the final upper limit, or below the initial lower limit, are then given missing values.
The LDIRECTION
option controls the ordering of the levels (for a variate VECTOR
) or the labels (for a text VECTOR
) when LMETHOD
is set to median
, minimum
or maximum
. By default, they are sorted into ascending order, but you can set LDIRECTION=given
to take them in the order in which they occur in the VECTOR
. This may be useful, for example, if a text vector contains the names of days or of months in calendar order.
You can set the DECIMALS
option to request that the values of a variate VECTOR
be rounded to a particular number of decimal places before the groups are formed: for example DECIMALS=0
would round each value to the nearest integer.
You can redefine a VECTOR
structure as a factor by setting option REDEFINE=yes
and omitting to specify any corresponding identifier in the FACTOR
list. This can be very useful on occasions when you are unable to define in advance which levels will occur in a set of data.
The PRINT
option can be set to summary
to print a summary of the contents of the FACTOR
(numbers of values, missing values and levels).
Options: PRINT
, NGROUPS
, LMETHOD
, DECIMALS
, BOUNDARIES
, REDEFINE
, CASE
, LDIRECTION
, OMITUNBOUNDED
.
Parameters: VECTOR
, FACTOR
, LIMITS
, LEVELS
, LABELS
.
Action with RESTRICT
GROUPS
takes account of any restrictions on variates or texts in the VECTOR
list, and will give missing values to the excluded units. If more than one vector is restricted, then each of their restrictions must be the same.
See also
Directives: FACTOR
, VARIATE
, TEXT
.
Procedures: FACAMEND
, FACDIVIDE
, FACPRODUCT
, FACSORT
, FACLEVSTANDARDIZE
, FACUNIQUE
, FMFACTORS
, FFREERESPONSEFACTOR
, QFACTOR
.
Commands for: Calculations and manipulation.
Example
" Example GROU-1: Use of the GROUPS directive" VARIATE [VALUES=21,50,24,49,29,42,32,42,36,40] A & [VALUES=3000,17500,5000,20000,7000,4500,12000,18000,15500,17500] I TEXT [VALUES=Clarke,Irving,Adams,Jones,Day,Good,Edwards,Baker,Hall,Field] N FACTOR [LABELS=!T(male,female); VALUES=2,1,1,1,2,2,1,1,2,1] S " put ages into a factor Agef, with a level for each distinct age " GROUPS [PRINT=summary; LMETHOD=*] A; FACTOR=Agef PRINT A,Agef " form a factor Inclevel from variate I, according to 5000 (pound) levels " GROUPS [LMETHOD=*] I; FACTOR=Inclevel; LIMITS=!(5000,10000,15000,20000) PRINT I,Inclevel " form a factor to define 3 (nearly) equal sized income groups; set levels to median group values " GROUPS [NGROUP=3] I; FACTOR=Incgroup PRINT I,Incgroup