1. Home
  2. GROUPS directive

GROUPS directive

Forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.

Options

PRINT = string token Printed output required (summary); default * i.e. no printing
NGROUPS = scalar Number of groups to form when LIMITS is not specified; if NGROUPS is also unspecified, each distinct value (allowing for rounding) defines a group; default *
LMETHOD = string token Defines how to form the levels variate if the setting of the VECTOR parameter is a variate, or the labels if it is a text; if LMETHOD=* no levels/labels are formed, and existing levels (for a variate VECTOR) or labels (for a text VECTOR) of an already declared FACTOR will be retained if still appropriate (given, minimum, median, maximum, limit); default medi
DECIMALS = scalar Number of decimal places to which to round the VECTOR before forming the groups; default * i.e. no rounding
BOUNDARIES = string token Whether to interpret the LIMITS as upper or lower boundaries (upper, lower); default lowe
REDEFINE = string token Whether to allow a structure in the FACTOR list that has already been declared (e.g. as a variate or text) to be redefined (yes, no); default no
CASE = string token Whether the case of letters (small and capital) in text should be regarded as significant or ignored (significant, ignored); default sign
LDIRECTION = string token How to define the levels (for a variate VECTOR) or labels (for a text VECTOR) when LMETHOD = minimum, median or maximum (ascending, given); default asce
OMITUNBOUNDED = string token Whether to omit the (unbounded) group that occurs below the lowest limit when BOUNDARIES=lower, or above the final limit when BOUNDARIES=upper (yes, no); default no

Parameters

VECTOR = variates or texts Vectors whose values are to define the groups
FACTOR = factors Structures to be defined as factors to save details of the groups; default * will, if REDEFINE=yes, cause the corresponding VECTOR itself to be defined as a factor
LIMITS = variates or texts Limits to define the groups
LEVELS = variates Variate to define the levels of each FACTOR if LMETHOD=give, or to save them otherwise
LABELS = texts Text to define the labels of each FACTOR if LMETHOD=give, or to save them otherwise

Description

The GROUPS directive is designed to form factors from variates or texts. The variates and texts are specified by the VECTOR parameter, and the factors by the FACTOR parameter. With the simplest use of GROUPS you need specify no more than that, and each factor is defined to have a level for every distinct value of its corresponding variate or text. You need not have declared the factor already; it will be declared automatically if necessary.

Alternatively, you can divide the values of the variate or text into groups to be represented by the factor. You can use the LIMITS parameter to specify the range of values for each group. The limits vector is a text or a variate, depending whether the factor is being defined from a variate or a text; its values specify boundaries for the ranges. The BOUNDARIES option controls whether these are regarded as upper or lower boundaries; by default BOUNDARIES=lower. You can also ask GROUPS itself to set limits that will partition the units into groups of nearly equal size. You should then specify the NGROUPS option and leave the LIMITS parameter unset. (If you give both LIMITS and NGROUPS, then NGROUPS is ignored.)

If you are defining a factor from a variate VECTOR, the LMETHOD option controls how the levels vector is formed, with the following settings:

    median forms the levels from the median of the units in each group (default);
    minimum forms them from the minimum value in each group;
    maximum form them from the maximum value;
    limit uses the values in the LIMITS variate;
    given uses the values supplied (in a variate) by the LEVELS parameter.

With any of the settings median, minumum, maximum or limit, you can use the LEVELS parameter to specify a variate to store the levels that are produced; this can be done even if no factor is being formed, that is if no identifier is supplied for the factor by the FACTOR list. Finally, if you set LMETHOD=*, no levels are formed and any existing levels of the factor will be retained if they are still appropriate; otherwise the levels will be the integers 1 upwards. With any of these settings, you can use the LABELS parameter to specify labels for the factor.

Similar rules apply if you have a text VECTOR except that LMETHOD then governs how the labels are defined for the factor, and LEVELS can be used to specify its levels. The CASE option controls whether the case of the letters in the text strings is important. So, for example, if you set CASE=ignored the strings 'April' and 'april' will be put into the same group. With the default, CASE=significant, they would form different groups.

When the levels are formed from a LIMITS variate, there will be one group with no corresponding limit. If BOUNDARIES=upper, the extra group is above the final limit. The level assigned to that group is then the value that is the same distance above the final limit as the distance between the final limit and the last but one limit. If BOUNDARIES=lower, the extra group is below the first limit, and its level is given the value that is the same distance below the first limit as the distance between the first and second limits. The situation is similar with a LIMITS text, but the label for the extra group is always the single-character string '-'. If you would prefer to have an exact correspondence between the level and the limits, you can set option OMITUNBOUNDED=yes to omit the “unbounded” extra group. Any units beyond the final upper limit, or below the initial lower limit, are then given missing values.

The LDIRECTION option controls the ordering of the levels (for a variate VECTOR) or the labels (for a text VECTOR) when LMETHOD is set to median, minimum or maximum. By default, they are sorted into ascending order, but you can set LDIRECTION=given to take them in the order in which they occur in the VECTOR. This may be useful, for example, if a text vector contains the names of days or of months in calendar order.

You can set the DECIMALS option to request that the values of a variate VECTOR be rounded to a particular number of decimal places before the groups are formed: for example DECIMALS=0 would round each value to the nearest integer.

You can redefine a VECTOR structure as a factor by setting option REDEFINE=yes and omitting to specify any corresponding identifier in the FACTOR list. This can be very useful on occasions when you are unable to define in advance which levels will occur in a set of data.

The PRINT option can be set to summary to print a summary of the contents of the FACTOR (numbers of values, missing values and levels).

Options: PRINT, NGROUPS, LMETHOD, DECIMALS, BOUNDARIES, REDEFINE, CASE, LDIRECTION, OMITUNBOUNDED.

Parameters: VECTOR, FACTOR, LIMITS, LEVELS, LABELS.

Action with RESTRICT

GROUPS takes account of any restrictions on variates or texts in the VECTOR list, and will give missing values to the excluded units. If more than one vector is restricted, then each of their restrictions must be the same.

See also

Directives: FACTOR, VARIATE, TEXT.

Procedures: FACAMEND, FACDIVIDE, FACPRODUCT, FACSORT, FACLEVSTANDARDIZE, FACUNIQUE, FMFACTORS, FFREERESPONSEFACTOR, QFACTOR.

Commands for: Calculations and manipulation.

Example

" Example GROU-1: Use of the GROUPS directive"
VARIATE [VALUES=21,50,24,49,29,42,32,42,36,40] A
      & [VALUES=3000,17500,5000,20000,7000,4500,12000,18000,15500,17500]  I
TEXT [VALUES=Clarke,Irving,Adams,Jones,Day,Good,Edwards,Baker,Hall,Field] N
FACTOR [LABELS=!T(male,female); VALUES=2,1,1,1,2,2,1,1,2,1] S

" put ages into a factor Agef, with a level for each distinct age "
GROUPS [PRINT=summary; LMETHOD=*] A; FACTOR=Agef
PRINT A,Agef

" form a factor Inclevel from variate I, according to 5000 (pound) levels "
GROUPS [LMETHOD=*] I; FACTOR=Inclevel; LIMITS=!(5000,10000,15000,20000)
PRINT I,Inclevel

" form a factor to define 3 (nearly) equal sized income groups; 
  set levels to median group values "
GROUPS [NGROUP=3] I; FACTOR=Incgroup
PRINT I,Incgroup
Updated on June 19, 2019

Was this article helpful?