Generates factor values for designed experiments: with no options set, factor values are generated in standard order; the options allow treatment factors to be generated using the design-key method, or pseudo-factors to be generated to describe the confounding in a partially balanced experimental design.
||Model term for which pseudo-factors are to be generated; default
||Factors defining replicates of the design; default
||Block formula (for design-key generation) or term (for generation of pseudo-factors); default
||Key matrix (number of factors in the parameter list by number of factors in the
||Base vector for design key generation; default
|factors||Factors whose values are to be generated|
GENERATE is invaluable when you have a set of data that is to be read in a systematic order: for example, you may want to take all the observations within one group, then the same number of observations within the next group, and so on until an equal number of observations has been read for every group. You can then define values of the grouping factor or factors by
GENERATE; so the only values that you need to read are the observed data. Designed experiments are the obvious instance where the data are structured in this way: for example, you might have all the data from the first block, then all those from the second block, and so on.
The best way to understand
GENERATE is to look at some examples. The values of a set of factors that you have defined by
GENERATE are said to be in standard order: that is their units are arranged so that the levels of the first factor occur in the same order as in its levels vector then, within each level of the first factor, the levels of the second factor are arranged similarly, and so on. For example
FACTOR [NVALUES=24; LEVELS=2] A
& [LEVELS=!(4,1,2)] B
& [LEVELS=4] C
C the values
A: 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
B: 4 4 4 4 1 1 1 1 2 2 2 2 4 4 4 4 1 1 1 1 2 2 2 2
C: 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Placing a number or a scalar in the parameter list has the same effect as if a factor with that number of levels had been listed. Thus to generate values only for
C, all that you require is
To generate values for just
C is even simpler since the cycling process is itself recycled until all the units have been covered. Omitting
A therefore causes all combinations of a level of
B with a level of
C to be used twice, in the same pattern as displayed above; so you need specify only
You get a warning if one of the cycles is incomplete, as would happen for example if
C had 18 values instead of 24.
This first use of
GENERATE, then, is particularly appropriate for generating the blocking factors in an experimental design.
Another use, obtained by setting the
BASEVECTOR options, is to form values of treatment factors using the design-key method. This method, described by Patterson (1976) and Patterson & Bailey (1978), provides a very flexible way of specifying the allocation of treatments in an experimental design. The method assumes that the units are identified by a set of what are called “plot” factors. In Genstat terms, these will often be the same as the factors that occur in the block formula of the design (see the
BLOCKSTRUCTURE directive), and they are specified by the
BLOCKS option of
GENERATE. The setting is a formula, but remember this can be just a list of factors if you do not wish to indicate their inter-relationships; if the setting is more than just a list, Genstat forms the set of plot factors by taking the factors from the block formula in the order in which they occur there. Of course, the factors need not be identical to those in the block formula. For example if one these factors has a non-prime number of levels, it may need to be specified instead as the combination of two or more (pseudo) factors: for example, in a block design with blocks of size eight, the plots might need to be indexed by three factors with two levels.
The treatment factors to be generated are again specified by the parameter of
KEY option specifies a matrix known as the design key, which indicates how the values of each treatment factor are to be calculated from the plot factors. The matrix has a row for each treatment factor and a column for each plot factor; below kij represents the element in row i and column j. (This is the transpose of the form used by Patterson 1976, but in Genstat it seems more convenient to specify the treatments by rows.) There is also an option called
BASEVECTOR, which can specify a variate with an element bi for each treatment factor to allow the levels of the factor to be shifted cyclically; if this is unset, Genstat assumes bi=0.
The calculation assumes that the values of the plot factors are represented by the integers zero upwards (and
GENERATE will perform this mapping automatically if necessary). The value q[i]u in unit u of treatment factor i is then given by
q[i]u = bi + ki1 × pu + ki2 × pu + … + kin × p[n]u modulo ti
where pu … p[n]u are the values of the plot factors in unit u, and ti is the number of levels of treatment factor i. The calculated values are integers in the range 0, 1 … ti-1, but
GENERATE will again map these to the defined levels if necessary.
To illustrate the process, the treatments to be allocated (before randomization) to the plots of an n × n Latin Square may be calculated as
Latin-factor-value = Row-factor-value + Column-factor-value modulo n
The values of the extra factor in a Graeco-Latin square can then be formed as
Graeco-factor-value = Row-factor-value + 2 × Column-factor-value modulo n
So design key has rows (1,1) and (1,2).
The design key thus provides a very convenient way of defining treatment factors. Essentially, the key identifies each factor i with the set of contrasts (in the usual terminology)
P**Ki1 P**Ki2 ... P[n]**Kin
and the skill when forming a design is in selecting the best set for each factor. Further keys are presented by Patterson & Bailey (1978), and these are used in the example of procedure
AKEY; this procedure extends the
GENERATE facilities by allowing the block factors to be generated automatically, and the design to be printed after the factors have been generated. The Genstat design system has a repertoire of keys, used by procedures
AGDESIGN to generate a range of designs including factorials, fractional factorials, Latin squares and Lattices. You can form your own keys for designs not covered by the repertoire, using the
GENERATE can also be used to form the values of pseudo-factors in partially balanced designs. The treatment term to which the pseudo-factors are to be linked is specified by the
TREATMENTS option. The factors that identify the replicates are specified by the
REPLICATES option, and those that identify the blocks within each replicate are specified by the
BLOCKS option. The settings of these two options are model formulae, but Genstat merely scans them to find which factors they contain; so you may again find it easiest simply to give the factors as a list. The parameter of
GENERATE lists the pseudo-factors. These have as many levels as there are blocks within each replicate. The blocks in the first replicate are used to determine which combinations of the factors in the treatment term correspond to each level of the first pseudo-factor, those in the second replicate are used for the second pseudo-factor, and so on. If a treatment combination occurs in more than one block within the same replicate, the level of the corresponding pseudo-factor is not determined uniquely and Genstat will report an error.
Any of the factors may be restricted to generate values for only a subset of the units.
Patterson, H.D. (1976). Generation of factorial designs. Journal of the Royal Statistical Society, Series B, 38, 175-179.
Patterson, H.D. & Bailey, R.A. (1978). Design keys for factorial experiments. Applied Statistics, 27, 335-343.
" Example 2:3.5.1 " " Analysis of the damage caused by waves to forward sections of cargo-carrying ships. The data, from McCullagh & Nelder (1989) p.204, are counts of damage incidents for each combination of three risk factors: the type of ship, the year of construction, and the period of operation." UNITS [NVALUES=40] FACTOR [LABELS=!T(A,B,C,D,E)] Type & [LABELS=!T('1960-64','1965-69','1970-74','1975-79')] Construction & [LABELS=!T('1960-74','1975-79')] Operation GENERATE Type,Construction,Operation " Read the number of months service and number of damage incidents." OPEN '%GENDIR%/Examples/GuidePart2/Ship.dat'; CHANNEL=2 READ [CHANNEL=2] Service,Damage CLOSE 2 " Use the log of the number of months of service as an offset in the model; CALCULATE turns zeroes into missing values, which will then be excluded by TERMS as required for a correct analysis." CALCULATE Logservice = LOG(Service) MODEL [DISTRIBUTION=poisson; LINK=log; OFFSET=Logservice] Damage TERMS [FACTORIAL=2] Type * Construction * Operation " Fit the main effects." FIT [FPROB=yes; TPROB=yes] Type + Construction + Operation " Try adding the two-factor interactions." TRY [PRINT=accumulated; FPROB=yes]\ Type.Construction + Type.Operation + Construction.Operation " Perform screening tests for the terms in the model." RSCREEN [FACTORIAL=2] Type * Construction * Operation