GENERATE directive

Generates factor values for designed experiments: with no options set, factor values are generated in standard order; the options allow treatment factors to be generated using the design-key method, or pseudo-factors to be generated to describe the confounding in a partially balanced experimental design.

Options

`TREATMENTS` = formula	Model term for which pseudo-factors are to be generated; default `*`
`REPLICATES` = formula	Factors defining replicates of the design; default `*`
`BLOCKS` = formula	Block formula (for design-key generation) or term (for generation of pseudo-factors); default `*`
`KEY` = matrix	Key matrix (number of factors in the parameter list by number of factors in the `BLOCKS` formula) to generate the factors by the design key method; default `*`
`BASEVECTOR` = variate	Base vector for design key generation; default `*`

Parameter

factors	Factors whose values are to be generated

Description

GENERATE is invaluable when you have a set of data that is to be read in a systematic order: for example, you may want to take all the observations within one group, then the same number of observations within the next group, and so on until an equal number of observations has been read for every group. You can then define values of the grouping factor or factors by GENERATE; so the only values that you need to read are the observed data. Designed experiments are the obvious instance where the data are structured in this way: for example, you might have all the data from the first block, then all those from the second block, and so on.

The best way to understand GENERATE is to look at some examples. The values of a set of factors that you have defined by GENERATE are said to be in standard order: that is their units are arranged so that the levels of the first factor occur in the same order as in its levels vector then, within each level of the first factor, the levels of the second factor are arranged similarly, and so on. For example

FACTOR [NVALUES=24; LEVELS=2] A

& [LEVELS=!(4,1,2)] B

& [LEVELS=4] C

GENERATE A,B,C

gives A, B and C the values

A: 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

B: 4 4 4 4 1 1 1 1 2 2 2 2 4 4 4 4 1 1 1 1 2 2 2 2

C: 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

Placing a number or a scalar in the parameter list has the same effect as if a factor with that number of levels had been listed. Thus to generate values only for A and C, all that you require is

GENERATE A,3,C

To generate values for just B and C is even simpler since the cycling process is itself recycled until all the units have been covered. Omitting A therefore causes all combinations of a level of B with a level of C to be used twice, in the same pattern as displayed above; so you need specify only

GENERATE B,C

You get a warning if one of the cycles is incomplete, as would happen for example if B and C had 18 values instead of 24.

This first use of GENERATE, then, is particularly appropriate for generating the blocking factors in an experimental design.

Another use, obtained by setting the BLOCKS, KEY and BASEVECTOR options, is to form values of treatment factors using the design-key method. This method, described by Patterson (1976) and Patterson & Bailey (1978), provides a very flexible way of specifying the allocation of treatments in an experimental design. The method assumes that the units are identified by a set of what are called “plot” factors. In Genstat terms, these will often be the same as the factors that occur in the block formula of the design (see the BLOCKSTRUCTURE directive), and they are specified by the BLOCKS option of GENERATE. The setting is a formula, but remember this can be just a list of factors if you do not wish to indicate their inter-relationships; if the setting is more than just a list, Genstat forms the set of plot factors by taking the factors from the block formula in the order in which they occur there. Of course, the factors need not be identical to those in the block formula. For example if one these factors has a non-prime number of levels, it may need to be specified instead as the combination of two or more (pseudo) factors: for example, in a block design with blocks of size eight, the plots might need to be indexed by three factors with two levels.

The treatment factors to be generated are again specified by the parameter of GENERATE.

The KEY option specifies a matrix known as the design key, which indicates how the values of each treatment factor are to be calculated from the plot factors. The matrix has a row for each treatment factor and a column for each plot factor; below k_ij represents the element in row i and column j. (This is the transpose of the form used by Patterson 1976, but in Genstat it seems more convenient to specify the treatments by rows.) There is also an option called BASEVECTOR, which can specify a variate with an element b_i for each treatment factor to allow the levels of the factor to be shifted cyclically; if this is unset, Genstat assumes b_i=0.

The calculation assumes that the values of the plot factors are represented by the integers zero upwards (and GENERATE will perform this mapping automatically if necessary). The value q[i]_u in unit u of treatment factor i is then given by

q[i]_u = b_i + k_i₁ × p[1]_u + k_i₂ × p[2]_u + … + k_in × p[n]_u modulo t_i

where p[1]_u … p[n]_u are the values of the plot factors in unit u, and t_i is the number of levels of treatment factor i. The calculated values are integers in the range 0, 1 … t_i-1, but GENERATE will again map these to the defined levels if necessary.

To illustrate the process, the treatments to be allocated (before randomization) to the plots of an n × n Latin Square may be calculated as

Latin-factor-value = Row-factor-value + Column-factor-value modulo n

The values of the extra factor in a Graeco-Latin square can then be formed as

Graeco-factor-value = Row-factor-value + 2 × Column-factor-value modulo n

So design key has rows (1,1) and (1,2).

The design key thus provides a very convenient way of defining treatment factors. Essentially, the key identifies each factor i with the set of contrasts (in the usual terminology)

P[1]**K_i1 P[2]**K_i2 ... P[n]**K_in

and the skill when forming a design is in selecting the best set for each factor. Further keys are presented by Patterson & Bailey (1978), and these are used in the example of procedure AKEY; this procedure extends the GENERATE facilities by allowing the block factors to be generated automatically, and the design to be printed after the factors have been generated. The Genstat design system has a repertoire of keys, used by procedures DESIGN and AGDESIGN to generate a range of designs including factorials, fractional factorials, Latin squares and Lattices. You can form your own keys for designs not covered by the repertoire, using the FKEY directive.

GENERATE can also be used to form the values of pseudo-factors in partially balanced designs. The treatment term to which the pseudo-factors are to be linked is specified by the TREATMENTS option. The factors that identify the replicates are specified by the REPLICATES option, and those that identify the blocks within each replicate are specified by the BLOCKS option. The settings of these two options are model formulae, but Genstat merely scans them to find which factors they contain; so you may again find it easiest simply to give the factors as a list. The parameter of GENERATE lists the pseudo-factors. These have as many levels as there are blocks within each replicate. The blocks in the first replicate are used to determine which combinations of the factors in the treatment term correspond to each level of the first pseudo-factor, those in the second replicate are used for the second pseudo-factor, and so on. If a treatment combination occurs in more than one block within the same replicate, the level of the corresponding pseudo-factor is not determined uniquely and Genstat will report an error.

Options: TREATMENTS, REPLICATES, BLOCKS, KEY, BASEVECTOR.

Parameter: unnamed.

Action with `RESTRICT`

Any of the factors may be restricted to generate values for only a subset of the units.

References

Patterson, H.D. (1976). Generation of factorial designs. Journal of the Royal Statistical Society, Series B, 38, 175-179.

Patterson, H.D. & Bailey, R.A. (1978). Design keys for factorial experiments. Applied Statistics, 27, 335-343.

Example

" Example 2:3.5.1 "
" Analysis of the damage caused by waves to forward sections of
  cargo-carrying ships. The data, from McCullagh & Nelder (1989) p.204,
  are counts of damage incidents for each combination of three risk
  factors: the type of ship, the year of construction, and the 
  period of operation."
UNITS [NVALUES=40]
FACTOR [LABELS=!T(A,B,C,D,E)] Type
& [LABELS=!T('1960-64','1965-69','1970-74','1975-79')] Construction
& [LABELS=!T('1960-74','1975-79')] Operation
GENERATE Type,Construction,Operation
" Read the number of months service and number of damage incidents."
OPEN '%GENDIR%/Examples/GuidePart2/Ship.dat'; CHANNEL=2
READ [CHANNEL=2] Service,Damage
CLOSE 2
" Use the log of the number of months of service as an offset in the
  model; CALCULATE turns zeroes into missing values, which will then
  be excluded by TERMS as required for a correct analysis."
CALCULATE Logservice = LOG(Service)
MODEL [DISTRIBUTION=poisson; LINK=log; OFFSET=Logservice] Damage
TERMS [FACTORIAL=2] Type * Construction * Operation
" Fit the main effects."
FIT [FPROB=yes; TPROB=yes] Type + Construction + Operation
" Try adding the two-factor interactions."
TRY [PRINT=accumulated; FPROB=yes]\
    Type.Construction + Type.Operation +  Construction.Operation
" Perform screening tests for the terms in the model."
RSCREEN [FACTORIAL=2] Type * Construction * Operation

Updated on June 19, 2019

Was this article helpful?

Yes No