1. Highlights
● produced in 2002
● 7 new directives, 50 new procedures and 40 new functions
● the limitation of no more than 31 factors or variates in analysis of variance and in model formulae in regression has been removed
● Boolean calculations on sets (SETCALCULATE
, SETRELATE
)
● operations on a new tree data structure (BCONSTRUCT
, BGRAPH
, BPRINT
, BPRUNE
and BIDENTIFY
)
● classification trees (BCLASSIFICATION
, BCDISPLAY
, BCIDENTIFY
and BCVALUES
), identification keys (BKEY
, BKIDENTIFY
and BPRINT
) and regression trees (BREGRESSION
, BRDISPLAY
, BRPREDICT
and BRVALUES
).
● hierarchical generalized linear models (HGANALYSE
, HGDISPLAY
, HGFIXEDMODEL
, HGKEEP
, HGPLOT
and HGRANDOMMODEL
)
● estimation of the aggregation parameter for the negative binomial distribution in a generalized linear model (RNEGBINOMIAL
)
● Latin squares balanced for carry-over effects (AFCARRYOVER
, AGCROSSOVERLATIN
)
● stacking and unstacking of variates and factors (STACK
, UNSTACK
)
● plots of probability distributions (DPROBABILITY
)
2. What’s new
2.1 Directives
BCUT
cuts a tree at a defined node, discarding nodes and information below it.
BJOIN
extends a tree by joining another tree to a terminal node.
BGROW
adds new branches to a node of a tree.
SETCALCULATE
performs Boolean set calculations on the contents of vectors.
SETRELATE
compares two sets of values in two data structures.
SET2FORMULA
forms a model formula using structures supplied in a pointer.
TREE
declares a tree, & initializes it to have a single node known as the root.
2.2 Procedures
AFCARRYOVER
forms factors to represent carry-over effects in cross-over trials.
AFIELDRESIDUALS
display residuals in field layout.
AGCROSSOVERLATIN
generates Latin squares balanced for carry-over effects.
ALLPAIRWISE
performs a range of all pairwise multiple comparison tests.
AMMI
allows exploratory analysis of genotype × environment interactions.
AUKEEP
saves output from analysis of an unbalanced design (by AUNBALANCED
).
BCDISPLAY
displays a classification tree.
BCIDENTIFY
identifies specimens using a classification tree.
BCLASSIFICATION
constructs a classification tree.
BCONSTRUCT
constructs a tree.
BCVALUES
forms values for nodes of a classification tree.
BGRAPH
plots a tree.
BKDISPLAY
displays an identification key.
BKEY
constructs an identification key.
BKIDENTIFY
identifies specimens using a key.
BPRINT
displays a tree.
BPRUNE
prunes a tree using minimal cost complexity.
BRDISPLAY
displays a regression key.
BREGRESSION
constructs a regression tree.
BRPREDICT
makes predictions using a regression tree.
BRVALUES
forms values for nodes of a regression tree.
DCOMPOSITIONAL
plots 3-part compositional data within a barycentric triangle.
DMASS
plots discrete data like mass spectra, discrete probability functions.
DPROBABILITY
creates a probability distribution plot of the values in a variate.
FACDIVIDE
represents a factor by factorial combinations of a set of factors.
FBASICCONTRASTS
breaks a model term down into its basic contrasts.
FFRAME
forms multiple windows in a plot-matrix for high-resolution graphics.
FHADAMARDMATRIX
forms Hadamard matrices.
FITINDIVIDUALLY
fits regression models one term at a time.
FMFACTORS
forms a pointer of factors representing a multiple-response.
FPROJECTIONMATRIX
forms a projection matrix for a set of model terms.
GSTATISTIC
calculates the gamma statistic of agreement for ordinal data.
HGANALYSE
analyses data using hierarchical generalized linear models.
HGDISPLAY
displays a hierarchical generalized linear model analysis.
HGFIXEDMODEL
defines the fixed model for a hierarchical generalized linear model.
HGKEEP
saves information from a hierarchical generalized linear model analysis.
HGPLOT
produces model-checking plots for a hierarchical generalized linear model analysis.
HGRANDOMMODEL
defines the random model for a hierarchical generalized linear model.
JOIN
joins or merges two sets of vectors together, based on classifying keys.
KERNELDENSITY
uses kernel density estimation to estimate a sample density.
MTABULATE
forms tables classified by multiple-response factors.
MVFILL
replaces missing values in a vector with the previous non-missing value.
PRMANNWHITNEYU
calculates probabilities for the Mann-Whitney U statistic.
QLIST
gets the user to select a response interactively from a list.
REPPERIODOGRAM
gives periodogram-based analyses for replicated time series.
RNEGBINOMIAL
fits a negative binomial GLM estimating the aggregation parameter.
RSEARCH
helps search through models for a regression or generalized linear model.
STACK
combines several data sets by “stacking” the corresponding vectors.
UNSTACK
splits vectors into individual vectors according to levels of a factor.
XOEFFICIENCY
calculates efficiency of estimating effects in cross-over designs.
2.2 Functions
Summary functions
KURTOSIS(x) |
Kurtosis of the non-missing values in x . |
---|---|
SD(x) |
Standard deviation of the non-missing values in x . |
SEMEAN(x) |
Standard error of the mean of the non-missing values in x . |
SKEWNESS(x) |
Skewness of the non-missing values in x . |
PAREA(y;x) |
Area of a polygon with vertices specified by y and x . |
Transformations
BETA(a;b;x) |
Beta function (incomplete if x set, otherwise complete). |
---|---|
COSH(x) |
Hypobolic cosine of x . |
FRACTION(x) |
Fractional part of x i.e. x-SIGN(X)*INTEGER(x) . |
RANK(x) |
Ranks of the values in x . |
SIGN(x) |
Sign of x (-1, 0 or 1 for x <0, x ==0 or x >0 respectively). |
SINH(x) |
Hypobolic sine of x . |
TANH(x) |
Hypobolic tangent of x . |
Matrix functions
COLCENTRE(x) |
Centres the columns of matrix x by subtracting their means. |
---|---|
COLMEANS(x) |
Mean of the non-missing elements of each row of matrix x . |
COLNOBSERVATIONS(x) |
Number of non-missing elements in each column of matrix x . |
COLSUMS(x) |
Sum of the non-missing elements of each column of matrix x . |
EVALUES(x) |
Eigenvalues of x (as a diagonal matrix). |
EVECTORS(x) |
Eigenvectors of x (as a rectangular matrix). |
GINVERSE (x) |
Moore-Penrose generalized inverse of x . |
LSVECTORS(x) |
Matrix of vectors from the left-hand side of a singular-value decomposition of x . |
MAT0 |
Synonym of MZERO . |
ROWCENTRE(x) |
Centres the rows of matrix x by subtracting their means. |
ROWMEANS(x) |
Mean of the non-missing elements of each row of matrix x . |
ROWNOBSERVATIONS(x) |
Number of non-missing elements in each row of matrix x . |
ROWSUMS(x) |
Sum of the non-missing elements of each row of matrix x . |
RSVECTORS(x) |
Matrix of vectors from the right-hand side of a singular-value decomposition of x. |
SVALUES(x) |
Singular values of x (as a diagonal matrix). |
Probability functions
CLINVNORMAL(x;m;v) |
Cumulative lower probability for an inverse Normal (or inverse Gaussian) distribution with mean m and variance v . |
---|---|
CUINVNORMAL(x;m;v) |
Cumulative upper probability for an inverse Normal (or inverse Gaussian) distribution with mean m and variance v . |
EDINVNORMAL(p;m;v) |
Equivalent deviate corresponding to cumulative lower probability p for an inverse Normal (or inverse Gaussian). distribution with mean m and variance v . |
PRINVNORMAL(x;m;v) |
Probability density function for an inverse Normal (or inverse Gaussian) distribution with mean m and variance v . |
Vector functions
VKURTOSIS(p) |
Kurtosis of the non-missing values in each unit of the variates (or scalars) in pointer p . |
---|---|
VPOSITIONS(x;p) |
Gives the suffix of the first vector in the pointer p . containing the value in each unit of the variate or text x . |
VSD(x) |
Standard deviation of the non-missing values in each unit of the variates (or scalars) in pointer p . |
VSEMEANS(x) |
Standard error of the mean of non-missing values in each unit of the variates (or scalars) in pointer p . |
VSKEWNESS(x) |
Skewness of the non-missing values in each unit of the variates (or scalars) in pointer p . |
Table functions
TKURTOSIS(x) |
Forms margins containing the kurtosis of the cells in table t . |
---|---|
TSD(t) |
Forms margins of between-cell standard deviations for table t . |
TSEMEANS(t) |
Forms margins of standard errors for between-cell means of table t . |
TSKEWNESS(x) |
Forms margins containing the skewness of the cells in table t . |
Tree functions
BBELOW(t;n;m) |
provides a variate containing numbers of all the nodes below node n of tree t ; if m =1 this gives only the terminal nodes below n , otherwise it includes internal nodes as well. |
---|---|
BBRANCHES(t;n) |
provides a variate containing the numbers of the branches taken on the path to node n in tree t (the result is of the same length as the results of the BPATH function, and includes missing value as the final element, corresponding to n itself). |
BDEPTH(t;x) |
calculates the depths of nodes x in tree t . |
BMAXNODE(t) |
provides the maximum node number in tree t . |
BNBRANCHES(t;x) |
provides the number of branches below nodes x in tree t (0 for any than are terminal nodes). |
BNEXT(t;x;y) |
finds the numbers of the nodes on branches y from nodes x in tree t (returning a missing value for any terminal node). |
BNNODES(t) |
provides the number of nodes in tree t . |
BPATH(t;n) |
provides a variate containing the numbers of the nodes on the branch to node n in tree t (includes n itself as the final element). |
BPREVIOUS(t;x) |
finds the numbers of the nodes immediately above nodes x in tree t (or a missing value if a node is the root of the tree). |
BSCAN(t;x) |
finds the numbers of the nodes immediately after nodes x in tree t in an standard branch-by-branch order that visits each node once (or a missing value for the node that is the last one in the tree). |
BTERMINAL(t;x) |
finds the next terminal nodes after nodes x in tree t (or a missing value for the node that is the last terminal node). |
3. What’s changed
Most of the changes are compatible with Release 4.2, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.
Any command, where changes in Release 6 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol †. The full details are given in Section 3.4.
3.1 Directives
†AKEEP
directive has a new option RMETHOD
to control the type of residual that is saved. It also has seven new parameters. SEDMEANS
saves a symmetric matrix containing standard errors for comparisons between every pair of entries in the table of means. VCMEANS
saves a symmetric matrix containing variances and covariances of means. SECBMEANS
saves a table of standard errors for combined means, usable for calculating standard errors for differences between means in the table, at equal levels of the factors specified by the EQMEANS
option. VCCBMEANS
saves a symmetric matrix with variances and covariances of combined means. SEDCBMEANS
saves a symmetric matrix with standard errors for comparisons between every pair of entries in the table of combined means. DFMEANS
saves a symmetric matrix with degrees of freedom for comparisons between every pair of entries in the table of means. Finally, RTERM
saves a formula defining the residual term corresponding to a treatment term. A further change is that, if the replications of a term are all equal, they can be saved in a scalar instead of a table by the REPLICATION
parameter. Indeed, if the structure to save the replications has not yet been defined and the replications are equal, it will now be defined as a scalar rather than as a table.
ASSIGN
has a new default of zero for the NSUBSTITUTE
option, but the effect remains the same (i.e. no substitution).
DELETE
has a new option NSUBSTITUTE
for use when working with dummies. The default value, *
, substitutes the dummy (and any dummy to which it points) as now, so the deleted structure is the structure to which the dummy (eventually) points. NSUBSTITUTE
controls the number of times to substitute,as in ASSIGN
, so for example setting NSUBSTITUTE
=0 would delete the dummy itself.
DUMP
option PRINT
has a new setting, space
, to provide information about the current use of workspace within the GenStat server.
DUPLICATE
has a new option REDEFINE
to allow the type of a data structure to be redefined if required for the duplication.
†FCLASSIFICATION
has a new default *
for the FACTORIAL
option, meaning no limitation on the number of factors and variates in the terms that are generated. It also has a new option INCLUDEFUNCTIONS
to specify whether or not functions like POL
or SSPLINE
are to be included in the output formula or terms. Previously these were always omitted.
GET
has three new options. SEEDS
saves a pointer containing three variates that save the current seeds for GenStat’s three random-number generators (that is, for random number functions, for the RANDOMIZE
directive, and for internal use by directives). FIELDWIDTH
saves the current default fieldwidth for printing, and SIGNIFICANTFIGURES
saves the current default precision. The options have also been added to the SET
directive, to enable these items to be modified by users.
GROUPS
has an option CASE
which allows the case of letters in text to be ignored. It also has an option LDIRECTION
which can be set to 'given'
to request that the levels and labels be left in the order in which they are encountered in the data rather than being sorted into ascending order (the default). Note: GROUPS
also has a PRINT
option which was accidentally omitted from the R4.2 manuals!
†HELP
has been revised to present a more appropriate interface for each particular type of computer. On PCs running Windows, for example, it loads the contents screen of the Windows-based help to the command language.
†PEN
has two new parameters CSYMBOLS
and CLINE
to control the colour of symbols and lines drawn by the pen concerned, and the FILLCOLOUR
parameter is renamed to CFILL
.
PRINT
parameter JUSTIFICATION
parameter of has new settings 'centre'
or 'center'
to request that the output is centred within the specified field width.
READ
has a new option CASE
which allows the case of letters in texts to be ignored when these are converted to factors. It also has an option LDIRECTION
which can be set to ‘given’ to request that, when levels or labels are defined by READ
, they are left in the order in which they are encountered in the data rather than being sorted into ascending order (the default).
†PREDICT
can save and print standard errors for differences between predictions and, for models with the Normal distribution, least significant differences of predictions. The PRINT
option has three new settings: 'sed'
, 'lsd'
and 'vcovariance'
. PREDICT
also has three new options SED
, LSD
and LSDLEVEL
(inserted between SE
and VCOVARIANCE
): SED
and LSD
save matrices of standard errors of differences and least significant differences respectively, and LSDLEVEL
sets the significance level (in %) to use in the calculation of lsd’s.
PROCEDURE
option RESTORE
has two new settings: 'seeds'
restores the seeds for random number generation on exit from the procedure; and 'all'
has the same effect as listing all the available settings of RESTORE
.
RESUME
has a CLOSE
option, which allows you to close the file afterwards.
RKEEP
has three new parameters: SUMMARY
and ACCUMULATED
save the summary and accumulated analysis-of-variance (or deviance) tables respectively, and the STATISTICS
parameter allows statistics to be saved for any current y-variate (rather than only the first, as with the existing STATISTICS
option).
†TRY
can now provide a more succinct summary of the potential changes. This is requested by the new 'changes'
setting of the PRINT
option, which is now its default.
†VRESIDUAL
directive has a new option CONSTRAINT
which allows the residual variance to be fixed at its initial value.
VRESIDUAL
and VSTRUCTURE
directives have a new parameter EQUALITYCONSTRAINTS
that can constrain parameters in the variance model to have equal values.
3.2 Procedures
†AONEWAY
has been rewritten to provide customized facilities for one-way analysis of variance. For example, if the treatments have unequal replication, a standard error is printed for each mean, rather than the summary for comparisons of means with minimum and maximum replication as given by ANOVA
. Similarly, any missing values are excluded from the analysis by AONEWAY
. In ANOVA
they need to be included, to ensure balance in the more general situations that it covers, and are estimated as part of the analysis.
†APLOT
now provides index and absolute-residual plots, and the choice of line-printer or high-resolution graphics (default is high resolution). There are new options INDEX
and GRAPHICS
, and a new parameter PEN
.
AREPMEASURES
now customizes the ANOVA
output to take account of the correction factor on the degrees of freedom. It also has new options FPROBABILITY
, PSE
and LSDLEVEL
which operate as in ANOVA
, and an option EPSILON
to save the correction factor.
DESCRIBE
procedure now calculates the standard error of the variance.
DSCATTER
can now plot factors (as well as variates), and procedure TRELLIS
can now plot medians.
FACPRODUCT
has a new option LMETHOD
to control whether levels are formed only for combinations of the factors that are present in the data, or for all the combinations.
GLMM
has a new option CADJUST
controlling centring of covariates, and a new parameter ITERATIVEWEIGHTS
to save the iterative weights.
†MANNWHITNEY
now provides exact probabilities (using new procedure PRMANNWHITNEYU
). The NORMAL
option is now replaced by a PROBABILITY
option (saving the probability rather than the Normal approximation).
†PROBITANALYSIS
now provides a choice of methods, selected using the FITMETHOD
option. When FITMETHOD=generalizednonlinear
, the model is fitted as a generalized nonlinear model, using the FIT
directive. The alternative setting, 'nonlinear'
, fits it as a nonlinear model using FITNONLINEAR
. Apart from minor numerical differences, the two methods should generate the same results. Generalized nonlinear models allow a confidence region to be generated for lethal doses, and these are used as default for all situations except Wadley’s problem. The nonlinear method is more accurate, and is thus used as the default for the more difficult situation presented by Wadley’s problem. There is a new option LOGBASE
(between LD
and DISPERSION
) which can be used to specify the base of antilog transformation (if any) to be applied to the lethal doses, and there is a new MAXCYCLE
option to control the maximum number of iterations for fitting the model.
†RCHECK
can now plot confidence envelopes around Normal and half-Normal plots. These are controlled by new options ENVELOPE
, PROBABILITY
, NSIMULATIONS
and SHADE
, which are inserted between INDEX
and RESIDUALS
.
REPLICATION
now takes account of the detection probability (= one minus the type II error rate), and has an option PRDETECTION
for specifying it.
RPROPORTIONAL
and RSURVIVAL
have a new setting ‘loglikelihood’ for their PRINT
options to print -2 times the log likelihood.
TTEST
has revised headings and an extended summary, which now includes number of observations, mean, variance, standard deviation and standard error of mean.
†VPLOT
procedure can now produce composite plots like those from DAPLOT
and RCHECK
, as well as absolute-residual and index plots. There is a new parameter PEN
and a new option INDEX
.
3.3 Functions
In regression, the POL
and REG
functions have been extended to work on factors, and they can now be included in interactions. The meaning of the REG
function has been clarified, so that now its contrasts are always orthogonalized for the main effects of the variate or factor (even if the matrix third argument is set). Unorthogonalized contrasts are now fitted using the COMPARISON
function (previously available only in ANOVA
), which has an identical syntax to REG
.
The GAMMA
function has been extended to provide the incomplete gamma function (by setting an optional second argument).
3.4 Incompatibilities
AKEEP directive |
new option RSAVE inserted before SAVE ; new parameters SEDMEANS and VCMEANS inserted between SEMEANS and EFFECTS ; DFMEANS inserted between DF and SS ; RTERM between VARIANCE and CEFFICIENCY ; and SECBMEANS , SEDCBMEANS and VCCBMEANS between CBMEANS and CBEFFECTS . |
---|---|
AONEWAY procedure |
completely rewritten: GROUPS option must now be set to a factor; HOMOGENEITY option replaced by a homogeneity setting of the PRINT option; EXPLAIN option deleted. |
APLOT procedure |
new options INDEX and GRAPHICS before SAVE ; default is now to give high-resolution graphics. |
FCLASSIFICATION directive |
default for the FACTORIAL option now 0. |
HELP directive |
revised syntax, which may depend on the type of computer (for details type HELP alone on a line). |
MANNWHITNEY procedure |
option NORMAL replaced by option PROBABILITY (saving the probability rather than the Normal approximation). |
PEN directive |
FILLCOLOUR renamed to CFILL , and preceeded by new parameters CSYMBOLS and CLINE . |
PREDICT directive |
new options SED , LSD and LSDLEVEL inserted between SE and VCOVARIANCE . |
PROBITANALYSIS procedure |
option LOGBASE inserted between LD and DISPERSION . |
RCHECK procedure |
options ENVELOPE , PROBABILITY , NSIMULATIONS and SHADE inserted between INDEX and RESIDUALS . |
REG function |
in regression and generalized linear models these contrasts are now always orthogonalized for the main effects of the variate or factor, even if the matrix third argument is set (unorthogonalized contrasts can be fitted instead using the COMP function). |
TRY directive |
default for the PRINT option now 'changes' . |
VPLOT procedure |
option INDEX added before GRAPHICS . |
VRESIDUAL directive |
new option CONSTRAINT between VARIANCE and COORDINATES . |
Also procedures DAYCOUNT
, GETDATA
, SAVEDATA
, INVNORMAL
and EDINVNORMAL
are now obsolete. (You should use the date/time functions, Save-Session menus and/or RECORD
and RESUME
, and functions CLINVNORMAL
, CUINVNORMAL
, EDINVNORMAL
and PRINVNORMAL
instead).