Release 9: new features

1. Highlights

● produced in 2006

● 2 new directives, 64 new procedures and 2 new functions

● exact and permutation tests for regression and generalized linear models, analysis of variance and t-tests (see APERMTEST, RPERMTEST, AONEWAY and TTEST)

● two-straight-line, broken-stick or split-line models (see R2LINES)

● linear functional relationship models (see RLFUNCTIONAL)

● complex surveys (see SVCALIBRATE, SVREWEIGHT, SVTABULATE and SVWEIGHT)

● tabulation of standard deviations (see TABULATE)

● tables of modes (see TABMODE)

● analysis of multitiered (multi-phased) experiments (see AMTIER and AMTDISPLAY)

● exploration, analysis and visualization of microarray data from either two-colour or Affymetrix slides (see AFFYMETRIX, MA2CLUSTER, MABGCORRECT, MACALCULATE, MAEBAYES, MAESTIMATE, MAHISTOGRAM, MAPCLUSTER, MAPLOT, MARMA, MAROBUSTMEANS, MASCLUSTER, MASHADE, MAVDIFFERENCE, MAVOLCANO and MNORMALIZE)

● designs for two-colour microarray experiments: loop and reference-level designs and balanced-incomplete-block designs for any number of treatments in blocks of size 2 (see AGLOOP, AGREFERENCE, AGBIB and MADESIGN)

● screening tests for unbalanced designs with several error terms (see ASCREEN)

● Kendall’s rank correlation coefficient τ (see KTAU and PRKTAU)

● Cochran’s Q test for differences between related-samples (see QCOCHRAN)

● diversity statistics (see ECDIVERSITY)

● rank/abundance, ABC and k-dominance plots (see ECABUNDANCEPLOT)

● 10 additional similarity measures, including Dice, Canberra, Bray-Curtis and Minkowski (see FSIMILARITY)

● analysis of similarities i.e. ANOSIM (see ECANOSIM)

● modelling of species abundance data (see ECFIT, ECNICHE and ECRAREFACTION)

● Lorenz curves, Gini and asymmetry coefficients to assess the evenness of distributions (see LORENZ)

● analysis of the clustering of events in space and time (see DKSTPLOT, KSTHAT, KSTMCTEST, KSTSE and PTK3D)

● new, more efficient implementation of hierarchical generalized linear models, extending the facilities to allow random effects in the dispersion models (i.e. double hierarchical generalized linear models), predictions and correlation structures for random terms (see HGANALYSE, HGDRANDOMMODEL, HGPREDICT and HGRANDOMMODEL)

● Bayesian computing using the Differential Evolution Markov Chain algorithm (see DEMC)

● Median polishing of two-way data (see MPOLISH)

● Quantile normalization (see QNORMALIZE)

● Tukey biweight algorithm (see TUKEYBIWEIGHT)

● basis functions for natural cubic splines and thin-plate splines (see NCSPLINE and THINPLATE)

● formation of all partitions of a set of objects (see SETALLOCATIONS).

● ability to include “typesetting” commands in textual strings to define Greek letters and mathematical symbols to appear in the output (see PRINT)

● ability to print variates, matrices and tables with different numbers of decimals in different cells (see PRINT)

● ability to use alternative labels for data structures instead of their identifiers in output (see DUMMY, EXPRESSION, FACTOR, FORMULA, MATRIX, POINTER, SCALAR, SYMMETRICMATRIX, TABLE, TEXT and VARIATE)

2. What’s new

2.1 Directives

FAULT checks whether to issue a diagnostic, i.e. a fault, warning or message.

SETALLOCATIONS runs through all ways of allocating a set of objects to subsets.

2.2 Procedures

AFFYMETRIX estimates expression values for Affymetrix slides.

AGLOOP generates loop designs e.g. for time-course microarray experiments.

AGNATURALBLOCK forms 1- and 2-dimensional designs with blocks of natural size.

AGREFERENCE generates reference-level designs e.g. for microarray experiments.

AMTDISPLAY displays further output for multitiered designs analysed by AMTIER.

AMTIER analyses a multitiered design by analysis of variance specified by up to 3 model formulae.

APERMTEST does random permutation tests for analysis-of-variance tables.

ASCREEN performs screening tests for designs with orthogonal block structure.

A2DISPLAY provides further output following an analysis of variance by A2WAY.

A2KEEP copies information from an A2WAY analysis into GenStat data structures.

A2WAY performs analysis of variance of a balanced or unbalanced design with up to two treatment factors.

DEMC performs Bayesian computing using the Differential Evolution Markov Chain algorithm.

DKSTPLOT produces diagnostic plots for space-time clustering.

DMADENSITY plots the empirical CDF or PDF (kernel smoothed) by groups.

ECABUNDANCEPLOT produces rank/abundance, ABC and k-dominance plots.

ECANOSIM performs an analysis of similarities (ANOSIM).

ECDIVERSITY calculates measures of diversity with jackknife or bootstrap estimates.

ECFIT fits models to species abundance data.

ECNICHE generates relative abundance of species for niche-based models.

ECRAREFACTION calculates individual or sample-based rarefaction.

F2DRESIDUALVARIOGRAM calculates and plots a 2-dimensional variogram from a 2-dimensional array of residuals.

HGDRANDOMMODEL defines the random model in a hierarchical generalized linear model for the dispersion model of a double hierarchical generalized linear model.

HGPREDICT forms predictions from hierarchical or double hierarchical generalized linear model analysis.

KCROSSVALIDATION computes cross validation statistics for punctual kriging.

KSTHAT calculates an estimate of the K function in space, time and space-time.

KSTMCTEST performs a Monte-Carlo test for space-time interaction.

KSTSE calculates the standard error for the space-time K function.

KTAU calculates Kendall’s rank correlation coefficient τ.

LORENZ plots the Lorenz curve and calculates the Gini and asymmetry coefficients.

MAANOVA does analysis of variance for a single-channel microarray design.

MABGCORRECT performs background correction of Affymetrix slides.

MACALCULATE corrects and transforms two-colour microarray differential expressions.

MADESIGN assesses the efficiency of a two-colour microarray design.

MAEBAYES modifies t-values by an empirical Bayes method.

MAESTIMATE estimates treatment effects from a two-colour microarray design.

MAHISTOGRAM plots histograms of microarray data.

MAPCLUSTER clusters probes or genes with microarray data.

MAPLOT produces two-dimensional plots of microarray data.

MARMA calculates Affymetrix expression values.

MAROBUSTMEANS does a robust means analysis for Affymetrix slides.

MASCLUSTER clusters microarray slides.

MASHADE produces shade plots to display spatial variation of microarray data.

MAVDIFFERENCE applies the average difference algorithm to Affymetrix data.

MAVOLCANO produces volcano plots of microarray data.

MA2CLUSTER performs a two-way clustering of microarray data by probes (or genes) and slides.

MNORMALIZE normalizes two-colour microarray data.

MPOLISH performs a median polish of two-way data.

NCSPLINE calculates natural cubic spline basis functions (for use e.g. in REML).

PRKTAU calculates probabilities for Kendall’s rank correlation coefficient τ.

PTK3D performs kernel smoothing of space-time data.

QCOCHRAN performs Cochran’s Q test for differences between related-samples.

QFACTOR allows the user to decide to convert texts or variates to factors.

QNORMALIZE performs quantile normalization.

RLFUNCTIONAL fits a linear functional relationship model.

RPERMTEST does random permutation tests for regression or generalized-linear-model analyses.

RXGENSTAT Submits a set of commands externally to R and readS the output.

R2LINES fits two-straight-line (broken-stick) models to data.

SVCALIBRATE performs generalized calibration of survey data.

SVREWEIGHT modifies survey weights, adjusting other weights to ensure that their overall sum remains unchanged.

SVTABULATE tabulates data from random surveys, including multistage surveys and surveys with unequal probabilities of selection.

SVWEIGHT forms survey weights.

TABMODE forms summary tables of modes of values.

THINPLATE calculates the basis functions for thin-plate splines.

TUKEYBIWEIGHT estimates means using the Tukey biweight algorithm.

The following spatial-analysis procedures, previously available only in a supplementary library, have also now been incorporated into the main library.

FHAT calculates an estimate of the F nearest-neighbour distribution function.

FZERO gives the F function expectation under complete spatial randomness.

GHAT calculates an estimate of the G nearest-neighbour distribution function.

GRCSR generates completely spatially random points in a polygon.

KCSRENVELOPES simulates K function bounds under complete spatial randomness.

KHAT calculates an estimate of the K function.

KLABENVELOPES gives bounds for K function differences under random labelling.

KSED calculates the standard error for K function differences under random labelling.

KTORENVELOPES gives bounds for the bivariate K function under independence.

K12HAT calculates an estimate of the bivariate K function.

MSEKERNEL2D estimates the mean square error for a kernel smoothing.

PTAREAPOLYGON calculates the area of a polygon.

PTGRID generates a grid of points in a polygon.

PTINTENSITY calculates the overall density for a spatial point pattern.

PTKERNEL2D performs kernel smoothing of a spatial point pattern.

PTSINPOLYGON returns points inside or outside a polygon.

2.3 Functions

`GRSAMPLE`	random sampling with replacement
`GRSELECT`	random sampling without replacement

3. What’s changed

Most of the changes are completely compatible with Release 8, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated (see Section 1.7.1 of Part 1 of the Guide to the GenStat Command Language for details). To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.

Any command, where changes in Release 9 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol ^†. The full details are given in Section 3.4.

3.1 Directives

CAPTION can now print “notes”.

CLUSTER can save the criterion values, the group means and the group predictors (from maximal predictive classification).

FIT and the other regression model-fitting directives (ADD, DROP and SWITCH) have an additional setting, ignore, of the CONSTANT option to omit the constant but ignore it when assessing marginality constraints.

GET can now save the current settings for the printing of captions and typesetting in output or graphics.

PROCEDURE has an additional setting of the RESTORE option to restore the setting for the printing of captions.

SET can now control which captions are printed and whether typesetting takes place in output or graphics.

TERMS has a new option RIDGE to supply a constant to add to the diagonal of the sums-of-squares-and-products matrix, to allow ridge methods to be used in regression and generalized linear models.

DUMMY, EXPRESSION, FACTOR, FORMULA, MATRIX, POINTER, SCALAR, SYMMETRICMATRIX, TABLE, TEXT and VARIATE all have an extra option IPRINT to specify these data structures will be identified in output. If IPRINT is not set, they will be identified in whatever way is usual for the section of output is concerned. For example, the PRINT directive generally uses their identifiers (although this can be changed using the IPRINT option of PRINT itself), while the ANOVA directive will print the identifier and the extra text for each y-variate. So, for example, if you set IPRINT=extra, the “extra text” (defined by the EXTRA parameter of the directive) for the data structure will be used instead of its identifier – thus allowing complete freedom in the way that it is labelled.

3.2 Procedures

AGBIB can now construct balanced-incomplete-block designs for any number of treatments in blocks of size 2.

^†APLOT can plot residuals from any stratum (error term), not just from the final residual term, and the user can supply an overall title (to use instead of the identifier of the y-variate).

^†BOOTSTRAP can use a block formula to perform the randomizations used in a permutation test.

DECIMALS extended to allow different numbers of decimals to be determined for each unit of a data structure.

FIELLER now uses as t distribution when the dispersion parameter is not fixed.

^†HGANALYSE can now analyse double hierarchical generalized linear models.

GENPROCRUSTES can now scale each configuration prior to the analysis so that the trace of each matrix is one. It can also produce plots of the variable projections, and of the consensus with and without the individuals, and you can now control how many roots you wish to display or save.

MANNWHITNEY can print the median difference between the samples with confidence limits.

RMGLM can ignore the constant when assessing marginality constraints (see FIT), it can include units with missing values in the explanatory factors and variates, and it can save a regression save structure.

TTEST can provide probabilities using permutation tests or (when feasible) exact tests.

^†VPLOT the user can supply an overall title (to use instead of the identifier of the y-variate).

In addition, the nonparametric procedures now ignore missing values (instead of failing, they now print a warning message). The Library procedures have also been modified where possible to use the new ability to use different numbers of decimal places for different units of a variate, matrix or table, and to suppress irrelevant captions.

3.3 Functions

`CHARACTERS`	now has a second argument to specify whether to return the raw length of the string (without checking for any typesetting commands) or the formatted length (taking account of typesetting commands); see `PRINT` for more information.

3.3 Incompatibilities

`APLOT` procedure	option `STRATUM` inserted before `GRAPHICS`, and option `TITLE` inserted before `SAVE`.
`GENPROCRUSTES` procedure	options `NROOTS`, `PLOT` and `NDROOTS` inserted before `TOLERANCE`.
`HGANALYSE` procedure	many new options; also `DECIMALS` option deleted, `LMETHOD` option replaced by `DMETHOD` option (`LMETHOD` now used to choose between exact likelihood or extended quasi likelihood), `LAPLACEORDER` option renamed `DLAPLACEORDER`, `NCYCLE` option replaced by `MAXCYCLE`.
`HGDISPLAY` procedure	`DECIMALS` option deleted, `LMETHOD` option replaced by `DMETHOD` option.
`HGKEEP` procedure	`DHGRANDOMTERM` parameter inserted before `RESIDUALS`.
`HGPLOT` procedure	`RMETHOD` option inserted before `INDEX`.
`VPLOT` procedure	option `TITLE` inserted before `SAVE`.

Updated on June 19, 2019

Was this article helpful?

Yes No