Release 6: new features

1. Highlights

● produced in 2002

● 7 new directives, 50 new procedures and 40 new functions

● the limitation of no more than 31 factors or variates in analysis of variance and in model formulae in regression has been removed

● Boolean calculations on sets (SETCALCULATE, SETRELATE)

● operations on a new tree data structure (BCONSTRUCT, BGRAPH, BPRINT, BPRUNE and BIDENTIFY)

● classification trees (BCLASSIFICATION, BCDISPLAY, BCIDENTIFY and BCVALUES), identification keys (BKEY, BKIDENTIFY and BPRINT) and regression trees (BREGRESSION, BRDISPLAY, BRPREDICT and BRVALUES).

● hierarchical generalized linear models (HGANALYSE, HGDISPLAY, HGFIXEDMODEL, HGKEEP, HGPLOT and HGRANDOMMODEL)

● estimation of the aggregation parameter for the negative binomial distribution in a generalized linear model (RNEGBINOMIAL)

● Latin squares balanced for carry-over effects (AFCARRYOVER, AGCROSSOVERLATIN)

● stacking and unstacking of variates and factors (STACK, UNSTACK)

● plots of probability distributions (DPROBABILITY)

2. What’s new

2.1 Directives

BCUT cuts a tree at a defined node, discarding nodes and information below it.

BJOIN extends a tree by joining another tree to a terminal node.

BGROW adds new branches to a node of a tree.

SETCALCULATE performs Boolean set calculations on the contents of vectors.

SETRELATE compares two sets of values in two data structures.

SET2FORMULA forms a model formula using structures supplied in a pointer.

TREE declares a tree, & initializes it to have a single node known as the root.

2.2 Procedures

AFCARRYOVER forms factors to represent carry-over effects in cross-over trials.

AFIELDRESIDUALS display residuals in field layout.

AGCROSSOVERLATIN generates Latin squares balanced for carry-over effects.

ALLPAIRWISE performs a range of all pairwise multiple comparison tests.

AMMI allows exploratory analysis of genotype × environment interactions.

AUKEEP saves output from analysis of an unbalanced design (by AUNBALANCED).

BCDISPLAY displays a classification tree.

BCIDENTIFY identifies specimens using a classification tree.

BCLASSIFICATION constructs a classification tree.

BCONSTRUCT constructs a tree.

BCVALUES forms values for nodes of a classification tree.

BGRAPH plots a tree.

BKDISPLAY displays an identification key.

BKEY constructs an identification key.

BKIDENTIFY identifies specimens using a key.

BPRINT displays a tree.

BPRUNE prunes a tree using minimal cost complexity.

BRDISPLAY displays a regression key.

BREGRESSION constructs a regression tree.

BRPREDICT makes predictions using a regression tree.

BRVALUES forms values for nodes of a regression tree.

DCOMPOSITIONAL plots 3-part compositional data within a barycentric triangle.

DMASS plots discrete data like mass spectra, discrete probability functions.

DPROBABILITY creates a probability distribution plot of the values in a variate.

FACDIVIDE represents a factor by factorial combinations of a set of factors.

FBASICCONTRASTS breaks a model term down into its basic contrasts.

FFRAME forms multiple windows in a plot-matrix for high-resolution graphics.

FHADAMARDMATRIX forms Hadamard matrices.

FITINDIVIDUALLY fits regression models one term at a time.

FMFACTORS forms a pointer of factors representing a multiple-response.

FPROJECTIONMATRIX forms a projection matrix for a set of model terms.

GSTATISTIC calculates the gamma statistic of agreement for ordinal data.

HGANALYSE analyses data using hierarchical generalized linear models.

HGDISPLAY displays a hierarchical generalized linear model analysis.

HGFIXEDMODEL defines the fixed model for a hierarchical generalized linear model.

HGKEEP saves information from a hierarchical generalized linear model analysis.

HGPLOT produces model-checking plots for a hierarchical generalized linear model analysis.

HGRANDOMMODEL defines the random model for a hierarchical generalized linear model.

JOIN joins or merges two sets of vectors together, based on classifying keys.

KERNELDENSITY uses kernel density estimation to estimate a sample density.

MTABULATE forms tables classified by multiple-response factors.

MVFILL replaces missing values in a vector with the previous non-missing value.

PRMANNWHITNEYU calculates probabilities for the Mann-Whitney U statistic.

QLIST gets the user to select a response interactively from a list.

REPPERIODOGRAM gives periodogram-based analyses for replicated time series.

RNEGBINOMIAL fits a negative binomial GLM estimating the aggregation parameter.

RSEARCH helps search through models for a regression or generalized linear model.

STACK combines several data sets by “stacking” the corresponding vectors.

UNSTACK splits vectors into individual vectors according to levels of a factor.

XOEFFICIENCY calculates efficiency of estimating effects in cross-over designs.

2.2 Functions

Summary functions

`KURTOSIS(x)`	Kurtosis of the non-missing values in `x`.
`SD(x)`	Standard deviation of the non-missing values in `x`.
`SEMEAN(x)`	Standard error of the mean of the non-missing values in `x`.
`SKEWNESS(x)`	Skewness of the non-missing values in `x`.
`PAREA(y;x)`	Area of a polygon with vertices specified by `y` and `x`.

Transformations

`BETA(a;b;x)`	Beta function (incomplete if `x` set, otherwise complete).
`COSH(x)`	Hypobolic cosine of `x`.
`FRACTION(x)`	Fractional part of `x` i.e. `x-SIGN(X)*INTEGER(x)`.
`RANK(x)`	Ranks of the values in `x`.
`SIGN(x)`	Sign of `x` (-1, 0 or 1 for `x`<0, `x`==0 or `x`>0 respectively).
`SINH(x)`	Hypobolic sine of `x`.
`TANH(x)`	Hypobolic tangent of `x`.

Matrix functions

`COLCENTRE(x)`	Centres the columns of matrix `x` by subtracting their means.
`COLMEANS(x)`	Mean of the non-missing elements of each row of matrix `x`.
`COLNOBSERVATIONS(x)`	Number of non-missing elements in each column of matrix `x`.
`COLSUMS(x)`	Sum of the non-missing elements of each column of matrix `x`.
`EVALUES(x)`	Eigenvalues of `x` (as a diagonal matrix).
`EVECTORS(x)`	Eigenvectors of `x` (as a rectangular matrix).
`GINVERSE (x)`	Moore-Penrose generalized inverse of `x`.
`LSVECTORS(x)`	Matrix of vectors from the left-hand side of a singular-value decomposition of `x`.
`MAT0`	Synonym of `MZERO`.
`ROWCENTRE(x)`	Centres the rows of matrix `x` by subtracting their means.
`ROWMEANS(x)`	Mean of the non-missing elements of each row of matrix `x`.
`ROWNOBSERVATIONS(x)`	Number of non-missing elements in each row of matrix `x`.
`ROWSUMS(x)`	Sum of the non-missing elements of each row of matrix `x`.
`RSVECTORS(x)`	Matrix of vectors from the right-hand side of a singular-value decomposition of x.
`SVALUES(x)`	Singular values of `x` (as a diagonal matrix).

Probability functions

`CLINVNORMAL(x;m;v)`	Cumulative lower probability for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v`.
`CUINVNORMAL(x;m;v)`	Cumulative upper probability for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v`.
`EDINVNORMAL(p;m;v)`	Equivalent deviate corresponding to cumulative lower probability `p` for an inverse Normal (or inverse Gaussian). distribution with mean `m` and variance `v`.
`PRINVNORMAL(x;m;v)`	Probability density function for an inverse Normal (or inverse Gaussian) distribution with mean `m` and variance `v`.

Vector functions

`VKURTOSIS(p)`	Kurtosis of the non-missing values in each unit of the variates (or scalars) in pointer `p`.
`VPOSITIONS(x;p)`	Gives the suffix of the first vector in the pointer `p`. containing the value in each unit of the variate or text `x`.
`VSD(x)`	Standard deviation of the non-missing values in each unit of the variates (or scalars) in pointer `p`.
`VSEMEANS(x)`	Standard error of the mean of non-missing values in each unit of the variates (or scalars) in pointer `p`.
`VSKEWNESS(x)`	Skewness of the non-missing values in each unit of the variates (or scalars) in pointer `p`.

Table functions

`TKURTOSIS(x)`	Forms margins containing the kurtosis of the cells in table `t`.
`TSD(t)`	Forms margins of between-cell standard deviations for table `t`.
`TSEMEANS(t)`	Forms margins of standard errors for between-cell means of table `t`.
`TSKEWNESS(x)`	Forms margins containing the skewness of the cells in table `t`.

Tree functions

`BBELOW(t;n;m)`	provides a variate containing numbers of all the nodes below node `n` of tree `t`; if `m`=1 this gives only the terminal nodes below `n`, otherwise it includes internal nodes as well.
`BBRANCHES(t;n)`	provides a variate containing the numbers of the branches taken on the path to node `n` in tree `t` (the result is of the same length as the results of the `BPATH` function, and includes missing value as the final element, corresponding to `n` itself).
`BDEPTH(t;x)`	calculates the depths of nodes `x` in tree `t`.
`BMAXNODE(t)`	provides the maximum node number in tree `t`.
`BNBRANCHES(t;x)`	provides the number of branches below nodes `x` in tree `t` (0 for any than are terminal nodes).
`BNEXT(t;x;y)`	finds the numbers of the nodes on branches `y` from nodes `x` in tree `t` (returning a missing value for any terminal node).
`BNNODES(t)`	provides the number of nodes in tree `t`.
`BPATH(t;n)`	provides a variate containing the numbers of the nodes on the branch to node `n` in tree `t` (includes `n` itself as the final element).
`BPREVIOUS(t;x)`	finds the numbers of the nodes immediately above nodes `x` in tree `t` (or a missing value if a node is the root of the tree).
`BSCAN(t;x)`	finds the numbers of the nodes immediately after nodes `x` in tree `t` in an standard branch-by-branch order that visits each node once (or a missing value for the node that is the last one in the tree).
`BTERMINAL(t;x)`	finds the next terminal nodes after nodes `x` in tree `t` (or a missing value for the node that is the last terminal node).

3. What’s changed

Most of the changes are compatible with Release 4.2, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.

Any command, where changes in Release 6 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol ^†. The full details are given in Section 3.4.

3.1 Directives

^†AKEEP directive has a new option RMETHOD to control the type of residual that is saved. It also has seven new parameters. SEDMEANS saves a symmetric matrix containing standard errors for comparisons between every pair of entries in the table of means. VCMEANS saves a symmetric matrix containing variances and covariances of means. SECBMEANS saves a table of standard errors for combined means, usable for calculating standard errors for differences between means in the table, at equal levels of the factors specified by the EQMEANS option. VCCBMEANS saves a symmetric matrix with variances and covariances of combined means. SEDCBMEANS saves a symmetric matrix with standard errors for comparisons between every pair of entries in the table of combined means. DFMEANS saves a symmetric matrix with degrees of freedom for comparisons between every pair of entries in the table of means. Finally, RTERM saves a formula defining the residual term corresponding to a treatment term. A further change is that, if the replications of a term are all equal, they can be saved in a scalar instead of a table by the REPLICATION parameter. Indeed, if the structure to save the replications has not yet been defined and the replications are equal, it will now be defined as a scalar rather than as a table.

ASSIGN has a new default of zero for the NSUBSTITUTE option, but the effect remains the same (i.e. no substitution).

DELETE has a new option NSUBSTITUTE for use when working with dummies. The default value, *, substitutes the dummy (and any dummy to which it points) as now, so the deleted structure is the structure to which the dummy (eventually) points. NSUBSTITUTE controls the number of times to substitute,as in ASSIGN, so for example setting NSUBSTITUTE=0 would delete the dummy itself.

DUMP option PRINT has a new setting, space, to provide information about the current use of workspace within the GenStat server.

DUPLICATE has a new option REDEFINE to allow the type of a data structure to be redefined if required for the duplication.

^†FCLASSIFICATION has a new default * for the FACTORIAL option, meaning no limitation on the number of factors and variates in the terms that are generated. It also has a new option INCLUDEFUNCTIONS to specify whether or not functions like POL or SSPLINE are to be included in the output formula or terms. Previously these were always omitted.

GET has three new options. SEEDS saves a pointer containing three variates that save the current seeds for GenStat’s three random-number generators (that is, for random number functions, for the RANDOMIZE directive, and for internal use by directives). FIELDWIDTH saves the current default fieldwidth for printing, and SIGNIFICANTFIGURES saves the current default precision. The options have also been added to the SET directive, to enable these items to be modified by users.

GROUPS has an option CASE which allows the case of letters in text to be ignored. It also has an option LDIRECTION which can be set to 'given' to request that the levels and labels be left in the order in which they are encountered in the data rather than being sorted into ascending order (the default). Note: GROUPS also has a PRINT option which was accidentally omitted from the R4.2 manuals!

^†HELP has been revised to present a more appropriate interface for each particular type of computer. On PCs running Windows, for example, it loads the contents screen of the Windows-based help to the command language.

^†PEN has two new parameters CSYMBOLS and CLINE to control the colour of symbols and lines drawn by the pen concerned, and the FILLCOLOUR parameter is renamed to CFILL.

PRINT parameter JUSTIFICATION parameter of has new settings 'centre' or 'center' to request that the output is centred within the specified field width.

READ has a new option CASE which allows the case of letters in texts to be ignored when these are converted to factors. It also has an option LDIRECTION which can be set to ‘given’ to request that, when levels or labels are defined by READ, they are left in the order in which they are encountered in the data rather than being sorted into ascending order (the default).

^†PREDICT can save and print standard errors for differences between predictions and, for models with the Normal distribution, least significant differences of predictions. The PRINT option has three new settings: 'sed', 'lsd' and 'vcovariance'. PREDICT also has three new options SED, LSD and LSDLEVEL (inserted between SE and VCOVARIANCE): SED and LSD save matrices of standard errors of differences and least significant differences respectively, and LSDLEVEL sets the significance level (in %) to use in the calculation of lsd’s.

PROCEDURE option RESTORE has two new settings: 'seeds' restores the seeds for random number generation on exit from the procedure; and 'all' has the same effect as listing all the available settings of RESTORE.

RESUME has a CLOSE option, which allows you to close the file afterwards.

RKEEP has three new parameters: SUMMARY and ACCUMULATED save the summary and accumulated analysis-of-variance (or deviance) tables respectively, and the STATISTICS parameter allows statistics to be saved for any current y-variate (rather than only the first, as with the existing STATISTICS option).

^†TRY can now provide a more succinct summary of the potential changes. This is requested by the new 'changes' setting of the PRINT option, which is now its default.

^†VRESIDUAL directive has a new option CONSTRAINT which allows the residual variance to be fixed at its initial value.

VRESIDUAL and VSTRUCTURE directives have a new parameter EQUALITYCONSTRAINTS that can constrain parameters in the variance model to have equal values.

3.2 Procedures

^†AONEWAY has been rewritten to provide customized facilities for one-way analysis of variance. For example, if the treatments have unequal replication, a standard error is printed for each mean, rather than the summary for comparisons of means with minimum and maximum replication as given by ANOVA. Similarly, any missing values are excluded from the analysis by AONEWAY. In ANOVA they need to be included, to ensure balance in the more general situations that it covers, and are estimated as part of the analysis.

^†APLOT now provides index and absolute-residual plots, and the choice of line-printer or high-resolution graphics (default is high resolution). There are new options INDEX and GRAPHICS, and a new parameter PEN.

AREPMEASURES now customizes the ANOVA output to take account of the correction factor on the degrees of freedom. It also has new options FPROBABILITY, PSE and LSDLEVEL which operate as in ANOVA, and an option EPSILON to save the correction factor.

DESCRIBE procedure now calculates the standard error of the variance.

DSCATTER can now plot factors (as well as variates), and procedure TRELLIS can now plot medians.

FACPRODUCT has a new option LMETHOD to control whether levels are formed only for combinations of the factors that are present in the data, or for all the combinations.

GLMM has a new option CADJUST controlling centring of covariates, and a new parameter ITERATIVEWEIGHTS to save the iterative weights.

^†MANNWHITNEY now provides exact probabilities (using new procedure PRMANNWHITNEYU). The NORMAL option is now replaced by a PROBABILITY option (saving the probability rather than the Normal approximation).

^†PROBITANALYSIS now provides a choice of methods, selected using the FITMETHOD option. When FITMETHOD=generalizednonlinear, the model is fitted as a generalized nonlinear model, using the FIT directive. The alternative setting, 'nonlinear', fits it as a nonlinear model using FITNONLINEAR. Apart from minor numerical differences, the two methods should generate the same results. Generalized nonlinear models allow a confidence region to be generated for lethal doses, and these are used as default for all situations except Wadley’s problem. The nonlinear method is more accurate, and is thus used as the default for the more difficult situation presented by Wadley’s problem. There is a new option LOGBASE (between LD and DISPERSION) which can be used to specify the base of antilog transformation (if any) to be applied to the lethal doses, and there is a new MAXCYCLE option to control the maximum number of iterations for fitting the model.

^†RCHECK can now plot confidence envelopes around Normal and half-Normal plots. These are controlled by new options ENVELOPE, PROBABILITY, NSIMULATIONS and SHADE, which are inserted between INDEX and RESIDUALS.

REPLICATION now takes account of the detection probability (= one minus the type II error rate), and has an option PRDETECTION for specifying it.

RPROPORTIONAL and RSURVIVAL have a new setting ‘loglikelihood’ for their PRINT options to print -2 times the log likelihood.

TTEST has revised headings and an extended summary, which now includes number of observations, mean, variance, standard deviation and standard error of mean.

^†VPLOT procedure can now produce composite plots like those from DAPLOT and RCHECK, as well as absolute-residual and index plots. There is a new parameter PEN and a new option INDEX.

3.3 Functions

In regression, the POL and REG functions have been extended to work on factors, and they can now be included in interactions. The meaning of the REG function has been clarified, so that now its contrasts are always orthogonalized for the main effects of the variate or factor (even if the matrix third argument is set). Unorthogonalized contrasts are now fitted using the COMPARISON function (previously available only in ANOVA), which has an identical syntax to REG.

The GAMMA function has been extended to provide the incomplete gamma function (by setting an optional second argument).

3.4 Incompatibilities

`AKEEP` directive	new option `RSAVE` inserted before `SAVE`; new parameters `SEDMEANS` and `VCMEANS` inserted between `SEMEANS` and `EFFECTS`; `DFMEANS` inserted between `DF` and `SS`; `RTERM` between `VARIANCE` and `CEFFICIENCY`; and `SECBMEANS`, `SEDCBMEANS` and `VCCBMEANS` between `CBMEANS` and `CBEFFECTS`.
`AONEWAY` procedure	completely rewritten: `GROUPS` option must now be set to a factor; `HOMOGENEITY` option replaced by a `homogeneity` setting of the `PRINT` option; `EXPLAIN` option deleted.
`APLOT` procedure	new options `INDEX` and `GRAPHICS` before `SAVE`; default is now to give high-resolution graphics.
`FCLASSIFICATION` directive	default for the `FACTORIAL` option now 0.
`HELP` directive	revised syntax, which may depend on the type of computer (for details type `HELP` alone on a line).
`MANNWHITNEY` procedure	option `NORMAL` replaced by option `PROBABILITY` (saving the probability rather than the Normal approximation).
`PEN` directive	`FILLCOLOUR` renamed to `CFILL`, and preceeded by new parameters `CSYMBOLS` and `CLINE`.
`PREDICT` directive	new options `SED`, `LSD` and `LSDLEVEL` inserted between `SE` and `VCOVARIANCE`.
`PROBITANALYSIS` procedure	option `LOGBASE` inserted between `LD` and `DISPERSION`.
`RCHECK` procedure	options `ENVELOPE`, `PROBABILITY`, `NSIMULATIONS` and `SHADE` inserted between `INDEX` and `RESIDUALS`.
`REG` function	in regression and generalized linear models these contrasts are now always orthogonalized for the main effects of the variate or factor, even if the matrix third argument is set (unorthogonalized contrasts can be fitted instead using the `COMP` function).
`TRY` directive	default for the `PRINT` option now `'changes'`.
`VPLOT` procedure	option `INDEX` added before `GRAPHICS`.
`VRESIDUAL` directive	new option `CONSTRAINT` between `VARIANCE` and `COORDINATES`.

Also procedures DAYCOUNT, GETDATA, SAVEDATA, INVNORMAL and EDINVNORMAL are now obsolete. (You should use the date/time functions, Save-Session menus and/or RECORD and RESUME, and functions CLINVNORMAL, CUINVNORMAL, EDINVNORMAL and PRINVNORMAL instead).

Updated on June 19, 2019

Was this article helpful?

Yes No