Forms a factor (or grouping variable) from a variate or text, together with the set of distinct values that occur.

### Options

`PRINT` = string token |
Printed output required (`summary` ); default `*` i.e. no printing |
---|---|

`NGROUPS` = scalar |
Number of groups to form when `LIMITS` is not specified; if `NGROUPS` is also unspecified, each distinct value (allowing for rounding) defines a group; default `*` |

`LMETHOD` = string token |
Defines how to form the levels variate if the setting of the `VECTOR` parameter is a variate, or the labels if it is a text; if `LMETHOD=*` no levels/labels are formed, and existing levels (for a variate `VECTOR` ) or labels (for a text `VECTOR` ) of an already declared `FACTOR` will be retained if still appropriate (`given` , `minimum` , `median` , `maximum` , `limit` ); default `medi` |

`DECIMALS` = scalar |
Number of decimal places to which to round the `VECTOR` before forming the groups; default `*` i.e. no rounding |

`BOUNDARIES` = string token |
Whether to interpret the `LIMITS` as upper or lower boundaries (`upper` , `lower` ); default `lowe` |

`REDEFINE` = string token |
Whether to allow a structure in the `FACTOR` list that has already been declared (e.g. as a variate or text) to be redefined (`yes` , `no` ); default `no` |

`CASE` = string token |
Whether the case of letters (small and capital) in text should be regarded as significant or ignored (`significant, ignored` ); default `sign` |

`LDIRECTION` = string token |
How to define the levels (for a variate `VECTOR` ) or labels (for a text `VECTOR` ) when `LMETHOD` = `minimum` , `median` or `maximum` (`ascending` , `given` ); default `asce` |

`OMITUNBOUNDED` = string token |
Whether to omit the (unbounded) group that occurs below the lowest limit when `BOUNDARIES=lower` , or above the final limit when `BOUNDARIES=upper` (`yes` , `no` ); default `no` |

### Parameters

`VECTOR` = variates or texts |
Vectors whose values are to define the groups |
---|---|

`FACTOR` = factors |
Structures to be defined as factors to save details of the groups; default `*` will, if `REDEFINE=yes` , cause the corresponding `VECTOR` itself to be defined as a factor |

`LIMITS` = variates or texts |
Limits to define the groups |

`LEVELS` = variates |
Variate to define the levels of each `FACTOR` if `LMETHOD=give` , or to save them otherwise |

`LABELS` = texts |
Text to define the labels of each `FACTOR` if `LMETHOD=give` , or to save them otherwise |

### Description

The `GROUPS`

directive is designed to form factors from variates or texts. The variates and texts are specified by the `VECTOR`

parameter, and the factors by the `FACTOR`

parameter. With the simplest use of `GROUPS`

you need specify no more than that, and each factor is defined to have a level for every distinct value of its corresponding variate or text. You need not have declared the factor already; it will be declared automatically if necessary.

Alternatively, you can divide the values of the variate or text into groups to be represented by the factor. You can use the `LIMITS`

parameter to specify the range of values for each group. The limits vector is a text or a variate, depending whether the factor is being defined from a variate or a text; its values specify boundaries for the ranges. The `BOUNDARIES`

option controls whether these are regarded as upper or lower boundaries; by default `BOUNDARIES=lower`

. You can also ask `GROUPS`

itself to set limits that will partition the units into groups of nearly equal size. You should then specify the `NGROUPS`

option and leave the `LIMITS`

parameter unset. (If you give both `LIMITS`

and `NGROUPS`

, then `NGROUPS`

is ignored.)

If you are defining a factor from a variate `VECTOR`

, the `LMETHOD`

option controls how the levels vector is formed, with the following settings:

`median` |
forms the levels from the median of the units in each group (default); |
---|---|

`minimum` |
forms them from the minimum value in each group; |

`maximum` |
form them from the maximum value; |

`limit` |
uses the values in the `LIMITS` variate; |

`given` |
uses the values supplied (in a variate) by the `LEVELS` parameter. |

With any of the settings `median`

, `minumum`

, `maximum`

or `limit`

, you can use the `LEVELS`

parameter to specify a variate to store the levels that are produced; this can be done even if no factor is being formed, that is if no identifier is supplied for the factor by the `FACTOR`

list. Finally, if you set `LMETHOD=*`

, no levels are formed and any existing levels of the factor will be retained if they are still appropriate; otherwise the levels will be the integers 1 upwards. With any of these settings, you can use the `LABELS`

parameter to specify labels for the factor.

Similar rules apply if you have a text `VECTOR`

except that `LMETHOD`

then governs how the labels are defined for the factor, and `LEVELS`

can be used to specify its levels. The `CASE`

option controls whether the case of the letters in the text strings is important. So, for example, if you set `CASE=ignored`

the strings `'April'`

and `'april'`

will be put into the same group. With the default, `CASE=significant`

, they would form different groups.

When the levels are formed from a `LIMITS`

variate, there will be one group with no corresponding limit. If `BOUNDARIES=upper`

, the extra group is above the final limit. The level assigned to that group is then the value that is the same distance above the final limit as the distance between the final limit and the last but one limit. If `BOUNDARIES=lower`

, the extra group is below the first limit, and its level is given the value that is the same distance below the first limit as the distance between the first and second limits. The situation is similar with a `LIMITS`

text, but the label for the extra group is always the single-character string `'-'`

. If you would prefer to have an exact correspondence between the level and the limits, you can set option `OMITUNBOUNDED=yes`

to omit the “unbounded” extra group. Any units beyond the final upper limit, or below the initial lower limit, are then given missing values.

The `LDIRECTION`

option controls the ordering of the levels (for a variate `VECTOR`

) or the labels (for a text `VECTOR`

) when `LMETHOD`

is set to `median`

, `minimum`

or `maximum`

. By default, they are sorted into ascending order, but you can set `LDIRECTION=given`

to take them in the order in which they occur in the `VECTOR`

. This may be useful, for example, if a text vector contains the names of days or of months in calendar order.

You can set the `DECIMALS`

option to request that the values of a variate `VECTOR`

be rounded to a particular number of decimal places before the groups are formed: for example `DECIMALS=0`

would round each value to the nearest integer.

You can redefine a `VECTOR`

structure as a factor by setting option `REDEFINE=yes`

and omitting to specify any corresponding identifier in the `FACTOR`

list. This can be very useful on occasions when you are unable to define in advance which levels will occur in a set of data.

The `PRINT`

option can be set to `summary`

to print a summary of the contents of the `FACTOR`

(numbers of values, missing values and levels).

Options: `PRINT`

, `NGROUPS`

, `LMETHOD`

, `DECIMALS`

, `BOUNDARIES`

, `REDEFINE`

, `CASE`

, `LDIRECTION`

, `OMITUNBOUNDED`

.

Parameters: `VECTOR`

, `FACTOR`

, `LIMITS`

, `LEVELS`

, `LABELS`

.

### Action with `RESTRICT`

`GROUPS`

takes account of any restrictions on variates or texts in the `VECTOR`

list, and will give missing values to the excluded units. If more than one vector is restricted, then each of their restrictions must be the same.

### See also

Directives: `FACTOR`

, `VARIATE`

, `TEXT`

.

Procedures: `FACAMEND`

, `FACDIVIDE`

, `FACPRODUCT`

, `FACSORT`

, `FACLEVSTANDARDIZE`

, `FACUNIQUE`

, `FMFACTORS`

, `FFREERESPONSEFACTOR`

, `QFACTOR`

.

Commands for: Calculations and manipulation.

### Example

" Example GROU-1: Use of the GROUPS directive" VARIATE [VALUES=21,50,24,49,29,42,32,42,36,40] A & [VALUES=3000,17500,5000,20000,7000,4500,12000,18000,15500,17500] I TEXT [VALUES=Clarke,Irving,Adams,Jones,Day,Good,Edwards,Baker,Hall,Field] N FACTOR [LABELS=!T(male,female); VALUES=2,1,1,1,2,2,1,1,2,1] S " put ages into a factor Agef, with a level for each distinct age " GROUPS [PRINT=summary; LMETHOD=*] A; FACTOR=Agef PRINT A,Agef " form a factor Inclevel from variate I, according to 5000 (pound) levels " GROUPS [LMETHOD=*] I; FACTOR=Inclevel; LIMITS=!(5000,10000,15000,20000) PRINT I,Inclevel " form a factor to define 3 (nearly) equal sized income groups; set levels to median group values " GROUPS [NGROUP=3] I; FACTOR=Incgroup PRINT I,Incgroup