Forms a classification set for each term in a formula, breaks a formula up into separate formulae (one for each term), and applies a limit to the number of factors and variates in the terms of a formula.

### Options

`FACTORIAL` = scalar |
Limit on the number of factors and variates in each term; default `*` i.e. no limit |
---|---|

`NTERMS` = scalar |
Outputs the number of terms in the formula |

`CLASSIFICATION` = pointer |
Saves a list of all the factors and variates in the `TERMS` formula |

`OUTFORMULA` = formula structure |
Identifier of a formula to store a new formula, omitting terms with too many factors and variates |

`INCLUDEFUNCTIONS` = string token |
Whether or not to include functions in the formulae saved by the `OUTFORMULA` option or the `OUTTERMS` parameter (`yes` , `no` ); default `no` |

`REORDER` = string token |
When to reorder the terms in the model (`always` , `standard` , `never` ); default `stan` |

`DROPTERMS` = string token |
Whether to include only terms that can be dropped individually from the formula (`yes` , `no` ); default `no` |

`CHECKFUNCTIONS` = scalar |
Indicator, set to one if the `TERMS` formula contains any functions, and zero if it contains none |

`FUNCTIONDEFINITIONS` = pointer |
Saves details of the functions defined for each factor and variate in the `TERMS` formula |

### Parameters

`TERMS` = formula |
Formula from which the classification sets, individual model terms and so on are to be formed |
---|---|

`CLASSIFICATION` = pointers |
Identifiers giving a pointer to store the factors and variates composing each model term of the `TERMS` formula |

`OUTTERMS` = formula structures |
Identifiers giving a formula to store each individual term of the `TERMS` formula |

`MAINTERMS` = formula structures |
Identifiers giving a formula to store the main term for each individual of the `TERMS` formula |

### Description

If you are writing procedures, for example for statistical analyses, the model to be fitted will often be specified by a Genstat formula structure. Unless the algorithm within the procedure merely involves straightforward use of one of Genstat’s statistical directives, you may wish to know more about the formula: how many model terms does it contain, which factors do they involve, and so on. The `FCLASSIFICATION`

directive is designed to provide the answers to these questions. The formula is specified using the `TERMS`

parameter.

When Genstat uses a formula in a statistical analysis, it is expanded into a series of model terms, linked by the operator `+`

. `FCLASSIFICATION`

allows you to save this expanded form, in another formula, using the `OUTFORMULA`

option.

You can use the `FACTORIAL`

option to apply a limit to the number of factors and variates in the resulting terms, similarly to the `FACTORIAL`

option in the `ANOVA`

, `FIT`

or `REML`

directives. The number of terms in the formula can be saved (in a scalar) using the `NTERMS`

option, and a list of the factors and variates that occur in the formula can be saved (in a pointer) using the `CLASSIFICATION`

option.

The other parameters allow you to save information about the individual model terms in the formula. The identifiers in the lists that they specify are taken in parallel with the model terms in the expanded form of the formula. For each model term, the corresponding identifier in the list for the `CLASSIFICATION`

parameter is defined as a pointer storing the factors that occur in the term; and the identifier in the `OUTTERMS`

list is defined as a formula containing just that model term.

The `MAINTERMS`

parameter is useful if the formula contains pseudo-factors. Its identifiers save formula structures containing the “main term” for each of the model terms. If the term is a pseudo-term, this will be the model term to which the pseudo-term is linked. Otherwise, it will be the term itself. For example, in the model

`Variety//(A+B)`

in Example 4.7.3c in the *Guide to Genstat, Part 2 Statistics*, there are two pseudo-terms, `A`

and `B`

, with `Variety`

as their main term.

By default any functions such as `POL`

or `REG`

are omitted from the formulae saved by `OUTFORMULA`

or `OUTTERMS`

, but these will be included if you set option `INCLUDEFUNCTIONS=yes`

. The `CHECKFUNCTIONS`

option allows you to save a scalar containing one if the `TERMS`

formula contains any functions, and zero if it does not.

The `FUNCTIONDEFINITIONS`

option allows you to obtain details of the functions. This saves a pointer which contains a pointer for each factor and variate in the formula (in the same order as in the `CLASSIFICATION`

pointer). If the factor or variate has no function, its pointer contains just a text with a single missing value (`''`

). Otherwise the first element of the pointer is a text containing the name of the function (either `'POL'`

, `'POLND'`

, `'REG'`

, `'REGND'`

, `'COMP'`

, `'SSPLINE'`

or `'LOESS'`

). It then contains elements to store the second and subsequent arguments of the function (if any).

Model terms involving several factors are regarded by Genstat as representing all the joint effects of these factors that are not removed by earlier terms in the formula. So, in the formula

`A + B + A.B`

`A.B`

is the interaction of factors A and B, as both main effects occur earlier in the formula. Alternatively, in the formula

`A.B + A + B`

`A.B`

still represents all the joint effects of factors `A`

and `B`

, and the later terms `A`

and `B`

are redundant as they are now “contained” in `A.B`

. Thus `FCLASSIFICATION`

usually deletes any term in the model that is contained in an earlier term. However, if you set option `REORDER=always`

, the model is reordered after applying any operator (including plus). The reordering arranges the terms so that they contain increasing numbers of identifiers. Terms with the same number of identifiers are then put into lexicographical order with respect to the order in which the identifiers first occurred in the formula itself. Each term will therefore come before any term that would contain it. So the model would again be

`A + B + A.B`

The default setting, `REORDER=standard`

, applies the standard Genstat rules, which reorder the terms only after a dot, slash or star operator. The final setting `REORDER=never`

specifies that no reordering should take place. (Before Release 19.2, the `ORTHOGONAL`

option had settings `no`

and `yes`

, corresponding to `standard`

and `always`

. Options and parameters with settings `yes`

and `no`

should not have any other settings. So these were renamed in Release 19.2, when the setting never was added. However, `no`

and `yes`

are retained as synonyms, so that earlier programs will still run.)

The rules about terms that contain other terms are also relevant when you are dropping terms from a model, for example in a regression analysis. You cannot drop a term, for example using the `DROP`

directive, until all the terms that contain it have been dropped. To simplify the process, if you set option `DROPTERMS=yes`

, the formulae saved by `OUTFORMULA`

or `OUTTERMS`

will contain only terms that are not contained in any other terms (i.e. only the terms that can be dropped).

Options: `FACTORIAL`

, `NTERMS`

, `CLASSIFICATION`

, `OUTFORMULA`

, `INCLUDEFUNCTIONS`

, `REORDER`

, `DROPTERMS`

, `CHECKFUNCTIONS`

, `FUNCTIONDEFINITIONS`

.

Parameters: `TERMS`

, `CLASSIFICATION`

, `OUTTERMS`

, `MAINTERMS`

.

### See also

Directives: `FORMULA`

, `FARGUMENTS`

, `REFORMULATE`

, `SETCALCULATE`

, `SETRELATE`

, `SET2FORMULA`

.

Commands for: Calculations and manipulation.

### Example

" Example FCLA-1: Examples of the FCLASSIFICATION directive FCLASSIFICATION expands a formula and allows the following to be saved: 1) the expanded version, 2) any individual term of the formula, 3) the sets of variates and factors classifying the individual terms." FACTOR A,B,C FORMULA [VALUE=A*B] AstarB " expand AstarB " FCLASSIFICATION [OUTFORMULA=Expanded] #AstarB PRINT AstarB, Expanded " expand A*B*C imposing a limit of 2 on the number of factors or variates in the resulting terms (default for FACTORIAL is 3) " FCLASSIFICATION [FACTORIAL=2; OUTFORMULA=Expanded] A*B*C PRINT Expanded " calculate the number of terms N in the expanded formula, then save the terms in separate formulae T[1...N] and their classification sets in pointers S[1...N] " FCLASSIFICATION [FACTORIAL=2; NTERMS=N] A*B*C & A*B*C; CLASSIFICATION=S[1...N]; OUTTERMS=T[1...N] FOR Si=S[1...N]; Ti=T[1...N] PRINT Si,Ti ENDFOR