1. Highlights
● produced in 2006
● 2 new directives, 64 new procedures and 2 new functions
● exact and permutation tests for regression and generalized linear models, analysis of variance and t-tests (see APERMTEST
, RPERMTEST
, AONEWAY
and TTEST
)
● two-straight-line, broken-stick or split-line models (see R2LINES
)
● linear functional relationship models (see RLFUNCTIONAL
)
● complex surveys (see SVCALIBRATE
, SVREWEIGHT
, SVTABULATE
and SVWEIGHT
)
● tabulation of standard deviations (see TABULATE
)
● tables of modes (see TABMODE
)
● analysis of multitiered (multi-phased) experiments (see AMTIER and AMTDISPLAY
)
● exploration, analysis and visualization of microarray data from either two-colour or Affymetrix slides (see AFFYMETRIX
, MA2CLUSTER
, MABGCORRECT
, MACALCULATE
, MAEBAYES
, MAESTIMATE
, MAHISTOGRAM
, MAPCLUSTER
, MAPLOT
, MARMA
, MAROBUSTMEANS
, MASCLUSTER
, MASHADE
, MAVDIFFERENCE
, MAVOLCANO
and MNORMALIZE
)
● designs for two-colour microarray experiments: loop and reference-level designs and balanced-incomplete-block designs for any number of treatments in blocks of size 2 (see AGLOOP
, AGREFERENCE
, AGBIB
and MADESIGN
)
● screening tests for unbalanced designs with several error terms (see ASCREEN
)
● Kendall’s rank correlation coefficient τ (see KTAU
and PRKTAU
)
● Cochran’s Q test for differences between related-samples (see QCOCHRAN
)
● diversity statistics (see ECDIVERSITY
)
● rank/abundance, ABC and k-dominance plots (see ECABUNDANCEPLOT
)
● 10 additional similarity measures, including Dice, Canberra, Bray-Curtis and Minkowski (see FSIMILARITY
)
● analysis of similarities i.e. ANOSIM (see ECANOSIM
)
● modelling of species abundance data (see ECFIT
, ECNICHE
and ECRAREFACTION
)
● Lorenz curves, Gini and asymmetry coefficients to assess the evenness of distributions (see LORENZ
)
● analysis of the clustering of events in space and time (see DKSTPLOT
, KSTHAT
, KSTMCTEST
, KSTSE
and PTK3D
)
● new, more efficient implementation of hierarchical generalized linear models, extending the facilities to allow random effects in the dispersion models (i.e. double hierarchical generalized linear models), predictions and correlation structures for random terms (see HGANALYSE
, HGDRANDOMMODEL
, HGPREDICT
and HGRANDOMMODEL
)
● Bayesian computing using the Differential Evolution Markov Chain algorithm (see DEMC
)
● Median polishing of two-way data (see MPOLISH
)
● Quantile normalization (see QNORMALIZE
)
● Tukey biweight algorithm (see TUKEYBIWEIGHT
)
● basis functions for natural cubic splines and thin-plate splines (see NCSPLINE
and THINPLATE
)
● formation of all partitions of a set of objects (see SETALLOCATIONS
).
● ability to include “typesetting” commands in textual strings to define Greek letters and mathematical symbols to appear in the output (see PRINT
)
● ability to print variates, matrices and tables with different numbers of decimals in different cells (see PRINT
)
● ability to use alternative labels for data structures instead of their identifiers in output (see DUMMY
, EXPRESSION
, FACTOR
, FORMULA
, MATRIX
, POINTER
, SCALAR
, SYMMETRICMATRIX
, TABLE
, TEXT
and VARIATE
)
2. What’s new
2.1 Directives
FAULT
checks whether to issue a diagnostic, i.e. a fault, warning or message.
SETALLOCATIONS
runs through all ways of allocating a set of objects to subsets.
2.2 Procedures
AFFYMETRIX
estimates expression values for Affymetrix slides.
AGLOOP
generates loop designs e.g. for time-course microarray experiments.
AGNATURALBLOCK
forms 1- and 2-dimensional designs with blocks of natural size.
AGREFERENCE
generates reference-level designs e.g. for microarray experiments.
AMTDISPLAY
displays further output for multitiered designs analysed by AMTIER
.
AMTIER
analyses a multitiered design by analysis of variance specified by up to 3 model formulae.
APERMTEST
does random permutation tests for analysis-of-variance tables.
ASCREEN
performs screening tests for designs with orthogonal block structure.
A2DISPLAY
provides further output following an analysis of variance by A2WAY
.
A2KEEP
copies information from an A2WAY
analysis into GenStat data structures.
A2WAY
performs analysis of variance of a balanced or unbalanced design with up to two treatment factors.
DEMC
performs Bayesian computing using the Differential Evolution Markov Chain algorithm.
DKSTPLOT
produces diagnostic plots for space-time clustering.
DMADENSITY
plots the empirical CDF or PDF (kernel smoothed) by groups.
ECABUNDANCEPLOT
produces rank/abundance, ABC and k-dominance plots.
ECANOSIM
performs an analysis of similarities (ANOSIM).
ECDIVERSITY
calculates measures of diversity with jackknife or bootstrap estimates.
ECFIT
fits models to species abundance data.
ECNICHE
generates relative abundance of species for niche-based models.
ECRAREFACTION
calculates individual or sample-based rarefaction.
F2DRESIDUALVARIOGRAM
calculates and plots a 2-dimensional variogram from a 2-dimensional array of residuals.
HGDRANDOMMODEL
defines the random model in a hierarchical generalized linear model for the dispersion model of a double hierarchical generalized linear model.
HGPREDICT
forms predictions from hierarchical or double hierarchical generalized linear model analysis.
KCROSSVALIDATION
computes cross validation statistics for punctual kriging.
KSTHAT
calculates an estimate of the K function in space, time and space-time.
KSTMCTEST
performs a Monte-Carlo test for space-time interaction.
KSTSE
calculates the standard error for the space-time K function.
KTAU
calculates Kendall’s rank correlation coefficient τ.
LORENZ
plots the Lorenz curve and calculates the Gini and asymmetry coefficients.
MAANOVA
does analysis of variance for a single-channel microarray design.
MABGCORRECT
performs background correction of Affymetrix slides.
MACALCULATE
corrects and transforms two-colour microarray differential expressions.
MADESIGN
assesses the efficiency of a two-colour microarray design.
MAEBAYES
modifies t-values by an empirical Bayes method.
MAESTIMATE
estimates treatment effects from a two-colour microarray design.
MAHISTOGRAM
plots histograms of microarray data.
MAPCLUSTER
clusters probes or genes with microarray data.
MAPLOT
produces two-dimensional plots of microarray data.
MARMA
calculates Affymetrix expression values.
MAROBUSTMEANS
does a robust means analysis for Affymetrix slides.
MASCLUSTER
clusters microarray slides.
MASHADE
produces shade plots to display spatial variation of microarray data.
MAVDIFFERENCE
applies the average difference algorithm to Affymetrix data.
MAVOLCANO
produces volcano plots of microarray data.
MA2CLUSTER
performs a two-way clustering of microarray data by probes (or genes) and slides.
MNORMALIZE
normalizes two-colour microarray data.
MPOLISH
performs a median polish of two-way data.
NCSPLINE
calculates natural cubic spline basis functions (for use e.g. in REML
).
PRKTAU
calculates probabilities for Kendall’s rank correlation coefficient τ.
PTK3D
performs kernel smoothing of space-time data.
QCOCHRAN
performs Cochran’s Q test for differences between related-samples.
QFACTOR
allows the user to decide to convert texts or variates to factors.
QNORMALIZE
performs quantile normalization.
RLFUNCTIONAL
fits a linear functional relationship model.
RPERMTEST
does random permutation tests for regression or generalized-linear-model analyses.
RXGENSTAT
Submits a set of commands externally to R and readS the output.
R2LINES
fits two-straight-line (broken-stick) models to data.
SVCALIBRATE
performs generalized calibration of survey data.
SVREWEIGHT
modifies survey weights, adjusting other weights to ensure that their overall sum remains unchanged.
SVTABULATE
tabulates data from random surveys, including multistage surveys and surveys with unequal probabilities of selection.
SVWEIGHT
forms survey weights.
TABMODE
forms summary tables of modes of values.
THINPLATE
calculates the basis functions for thin-plate splines.
TUKEYBIWEIGHT
estimates means using the Tukey biweight algorithm.
The following spatial-analysis procedures, previously available only in a supplementary library, have also now been incorporated into the main library.
FHAT
calculates an estimate of the F nearest-neighbour distribution function.
FZERO
gives the F function expectation under complete spatial randomness.
GHAT
calculates an estimate of the G nearest-neighbour distribution function.
GRCSR
generates completely spatially random points in a polygon.
KCSRENVELOPES
simulates K function bounds under complete spatial randomness.
KHAT
calculates an estimate of the K function.
KLABENVELOPES
gives bounds for K function differences under random labelling.
KSED
calculates the standard error for K function differences under random labelling.
KTORENVELOPES
gives bounds for the bivariate K function under independence.
K12HAT
calculates an estimate of the bivariate K function.
MSEKERNEL2D
estimates the mean square error for a kernel smoothing.
PTAREAPOLYGON
calculates the area of a polygon.
PTGRID
generates a grid of points in a polygon.
PTINTENSITY
calculates the overall density for a spatial point pattern.
PTKERNEL2D
performs kernel smoothing of a spatial point pattern.
PTSINPOLYGON
returns points inside or outside a polygon.
2.3 Functions
GRSAMPLE |
random sampling with replacement |
---|---|
GRSELECT |
random sampling without replacement |
3. What’s changed
Most of the changes are completely compatible with Release 8, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated (see Section 1.7.1 of Part 1 of the Guide to the GenStat Command Language for details). To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.
Any command, where changes in Release 9 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol †. The full details are given in Section 3.4.
3.1 Directives
CAPTION
can now print “notes”.
CLUSTER
can save the criterion values, the group means and the group predictors (from maximal predictive classification).
FIT
and the other regression model-fitting directives (ADD
, DROP
and SWITCH
) have an additional setting, ignore
, of the CONSTANT
option to omit the constant but ignore it when assessing marginality constraints.
GET
can now save the current settings for the printing of captions and typesetting in output or graphics.
PROCEDURE
has an additional setting of the RESTORE
option to restore the setting for the printing of captions.
SET
can now control which captions are printed and whether typesetting takes place in output or graphics.
TERMS
has a new option RIDGE
to supply a constant to add to the diagonal of the sums-of-squares-and-products matrix, to allow ridge methods to be used in regression and generalized linear models.
DUMMY
, EXPRESSION
, FACTOR
, FORMULA
, MATRIX
, POINTER
, SCALAR
, SYMMETRICMATRIX
, TABLE
, TEXT
and VARIATE
all have an extra option IPRINT
to specify these data structures will be identified in output. If IPRINT
is not set, they will be identified in whatever way is usual for the section of output is concerned. For example, the PRINT
directive generally uses their identifiers (although this can be changed using the IPRINT
option of PRINT
itself), while the ANOVA
directive will print the identifier and the extra text for each y-variate. So, for example, if you set IPRINT=extra
, the “extra text” (defined by the EXTRA
parameter of the directive) for the data structure will be used instead of its identifier – thus allowing complete freedom in the way that it is labelled.
3.2 Procedures
AGBIB
can now construct balanced-incomplete-block designs for any number of treatments in blocks of size 2.
†APLOT
can plot residuals from any stratum (error term), not just from the final residual term, and the user can supply an overall title (to use instead of the identifier of the y-variate).
†BOOTSTRAP
can use a block formula to perform the randomizations used in a permutation test.
DECIMALS
extended to allow different numbers of decimals to be determined for each unit of a data structure.
FIELLER
now uses as t distribution when the dispersion parameter is not fixed.
†HGANALYSE
can now analyse double hierarchical generalized linear models.
GENPROCRUSTES
can now scale each configuration prior to the analysis so that the trace of each matrix is one. It can also produce plots of the variable projections, and of the consensus with and without the individuals, and you can now control how many roots you wish to display or save.
MANNWHITNEY
can print the median difference between the samples with confidence limits.
RMGLM
can ignore the constant when assessing marginality constraints (see FIT
), it can include units with missing values in the explanatory factors and variates, and it can save a regression save structure.
TTEST
can provide probabilities using permutation tests or (when feasible) exact tests.
†VPLOT
the user can supply an overall title (to use instead of the identifier of the y-variate).
In addition, the nonparametric procedures now ignore missing values (instead of failing, they now print a warning message). The Library procedures have also been modified where possible to use the new ability to use different numbers of decimal places for different units of a variate, matrix or table, and to suppress irrelevant captions.
3.3 Functions
CHARACTERS |
now has a second argument to specify whether to return the raw length of the string (without checking for any typesetting commands) or the formatted length (taking account of typesetting commands); see PRINT for more information. |
---|
3.3 Incompatibilities
APLOT procedure |
option STRATUM inserted before GRAPHICS , and option TITLE inserted before SAVE . |
---|---|
GENPROCRUSTES procedure |
options NROOTS , PLOT and NDROOTS inserted before TOLERANCE . |
HGANALYSE procedure |
many new options; also DECIMALS option deleted, LMETHOD option replaced by DMETHOD option (LMETHOD now used to choose between exact likelihood or extended quasi likelihood), LAPLACEORDER option renamed DLAPLACEORDER , NCYCLE option replaced by MAXCYCLE . |
HGDISPLAY procedure |
DECIMALS option deleted, LMETHOD option replaced by DMETHOD option. |
HGKEEP procedure |
DHGRANDOMTERM parameter inserted before RESIDUALS . |
HGPLOT procedure |
RMETHOD option inserted before INDEX . |
VPLOT procedure |
option TITLE inserted before SAVE . |