1. Highlights
● produced in 2012
● 5 new directives, 31 new procedures and 3 new functions
● spreadsheet output of results from analysis of variance and REML
(ASPREADSHEET
, VSPREADSHEET
)
● automatic checking of assumptions for ANOVA
(ACHECK
)
● D-optimal designs for nonlinear and generalized linear models (AFNONLINEAR
)
● Lasso (RLASSO
)
● selection of representative genotypes and markers (QGSELECT
, QMKSELECT
)
● quadratic discrimination (QDISCRIMINATE
)
● L-splines, P-splines, penalized radial and tensor splines (LSPLINE
, PSPLINE
, PENSPLINE
, RADIALSPLINE
, TENSORSPLINE
)
● alignment, baseline adjustment and finding of peaks in observed curves (ALIGNCURVE
, BASELINE
, PEAKFINDER
)
● data mining – association rules (ASRULES
) and k nearest neighbour prediction (KNEARESTNEIGHBOURS
)
2. What’s new
2.1 Directives
ASRULES
derives association rules from transaction data.
FCOPY
makes copies of files.
FDELETE
deletes files.
FRENAME
renames files.
GETLOCATIONS
finds locations of an identifier within a pointer, or a string within a factor or text, or a number within any numerical data structure.
2.2 Procedures
ACHECK
checks assumptions for an ANOVA
analysis.
ADPOLYNOMIAL
plots single-factor polynomial contrasts fitted by ANOVA
.
AFNONLINEAR
forms D-optimal designs to estimate the parameters of a nonlinear or generalized linear model.
ALIGNCURVE
forms an optimal warping to align an observed series of observations with a standard series.
AN1ADVICE
aims to give useful advice if a design that is thought to be balanced fails to be analysed by ANOVA
.
ASPREADSHEET
saves results from an analysis of variance in a spreadsheet.
BASELINE
estimates a baseline for a series of numbers whose minimum value is drifting.
CDNBLOCKDESIGN
constructs a block design using CycDesigN.
CDNROWCOLUMNDESIGN
constructs a row-column design using CycDesigN.
DMSCATTER
produces a scatter-plot matrix for one or two sets of variables.
FDISTINCTFACTORS
checks sets of factors to remove any that define duplicate classifications.
FNCORRELATION
calculates correlations from variances and covariances, together with their variances and covariances.
FNLINEAR
estimates linear functions of random variables, and calculates their variances and covariances.
FNPOWER
estimates products of powers of two random variables, and calculates their variances and covariances.
G2AEXPORT
forms a dbase file to transfer ANOVA
output to Agronomix Generation II.
G2AFACTORS
redefines block and treatment variables as factors.
G2VEXPORT
forms a dbase file to transfer REML
output to Agronomix Generation II.
KNEARESTNEIGHBOURS
classifies items or predicts their responses by examining their k nearest neighbours.
LSPLINE
calculates design matrices to fit a natural polynomial or trignometric L-spline as a linear mixed model.
PEAKFINDER
finds the locations of peaks in an observed series.
PENSPLINE
calculates design matrices to fit a penalized spline as a linear mixed model.
PSPLINE
calculates design matrices to fit a P-spline as a linear mixed model.
QDISCRIMINATE
performs quadratic discrimination between groups i.e. allowing for different variance-covariance matrices.
QGSELECT
obtains a representative selection of genotypes by means of genetic distance sampling or genetic distance optimization.
QMKRECODE
recodes marker scores into separate alleles.
QMKSELECT
obtains a representative selection of markers by means of genetic distance sampling or genetic distance optimization.
RADIALSPLINE
calculates design matrices to fit a radial-spline surface as a linear mixed model.
RLASSO
performs lasso using iteratively reweighted least-squares.
TENSORSPLINE
calculates design matrices to fit a tensor-spline surface as a linear mixed model.
VFPEDIGREE
checks and prepares pedigree information from several factors, for use by VPEDIGREE
and REML
.
VSPREADSHEET
saves results from a REML
analysis in a spreadsheet.
2.2 Functions
IPROBIT
calculates the inverse probit transformation (result in percentages).
PROBIT
calculates the probit transformation for a percentage p
.
REPLACE
replaces values in any numerical data structure.
3. What’s changed
Most of the changes are compatible with Release 14, the previous release. There are a few commands, however, where new options or parameters have been inserted into the existing lists. These may cause problems in statements where option or parameter names have been omitted or abbreviated (see Section 1.7.1 of Part 1 of the Guide to the GenStat Command Language for details). To avoid any difficulty, the name of the option/parameter after the new option/parameter should be given explicitly, and not abbreviated to fewer than four characters.
Any command, where changes in Release 15 may cause incompatibilities in existing programs, is marked in Sections 3.1 and 3.2 by the symbol †. The full details are given in Section 3.4.
3.1 Directives
†AKEEP
is now able to save variance-covariance matrices for covariate regression coefficients, and RTERM
can save appropriate strata for assessing block terms.
DEVICE
can set the resolution of hard-copy devices.
†DKEEP
can now save the lower and upper bounds for the z-axis..
FOR
now provides more convenient and efficient ways of specifying an index that changes in equal increments.
FSIMILARITY
now allows rectangular (between-group) similarity matrices to be printed.
GET
can obtain the name of the working directory. It can also get an integer that will be unique within the current job to use, for example, to define suitable names for temporary files.
OPEN
can specify custom content for the header of an HTML document.
RCYCLE
can set step lengths for FITCURVE
.
†RKEEP
can now save the fitted model, an indicator showing its type (regression, standard curve etc.) and an indicator to show whether or not a conatant term was included.
SET
now provides more flexible ways of setting default seeds to be used to generate random numbers in the various areas of GenStat. It can also set the working directory, and increase the amount of internal data space.
SETCALCULATE
now enables you to control whether to substitute dummies within pointers in the expression; it also allows you to suppress the warning messages that are given when data structures in the expression have no values.
SETRELATE
now enables you to control whether to substitute dummies within within LEFT
and RIGHT
pointers.
TERMS
now allows a variate of ridge values to be supplied, one for each diagonal element of the sums-of-squares-and-products matrix; it can also save the row labels of the sum-of-squares-and-products matrix.
3.2 Procedures
†ADSPREADSHEET
can now colour the cells of the spreadsheet according to levels of the design factors, and can save the spreadsheet as an Excel file.
†APERMTEST
can now plot the statistics obtained from the permutations, and can save the probabilities and critical values obtained from the permutations.
†AYPARALLEL
now allows covariates to be included in the analysis.
DIALLEL
can now produce the Griffing analysis of variance.
†HGKEEP
has a new option IGNOREFAILURE
that allows you to save information even if the fitting of the HGLM failed to converge.
HGPREDICT
now provides a clearer description of the predictions.
†MAANOVA
now allows covariates to be included in the analysis.
†QIBDPROBABILITIES
can now calculate probabilities for backcross inbred lines.
†QLDDECAY
now uses regression to speed up the calculations, and displays quantile regression lines to help interpretation.
†QMATCH
now allows you to specify an explicit set of genotypes or markers to remove.
†QMESTIMATE
has improved output.
†QMKDIAGNOSTICS
can now save details of the genotypes and markers that have problems.
†QSASSOCIATION
provides a new fast method, as an alternative to the exact method.
†QSESTIMATE
has improved output.
†QSIMULATE
can now simulate backcross inbred lines.
†QUANTILE
can now form population quantiles instead of sample quantiles.
†RLFUNCTIONAL
has been extended to provide plots, and many additional methods.
RQUADRATIC
can now save predictions, and plot the fitted quadratic surface.
RSPREADSHEET
can now save the spreadsheet as an Excel file.
RYPARALLEL
now allows a symmetric matrix of weights to be specified, for generalized least squares.
TABSORT
now allows you to keep the levels of some of the classifying factors of the tables in their original order.
T%CONTROL
allows percentages to be calculated of the means of several control levels.
UNSTACK
now allows has a new option MVINCLUDE
, to control whether to include null levels or data sets.
†VPLOT
has a new option RMETHOD
(as in VKEEP
) to specify which random terms to use when calculating the residuals.
†VTCOMPARISONS
can now make comparisons for every level of a groups factor.
3.3 Functions
No changes.
3.4 Incompatibilities
ADSPREADSHEET procedure |
options FOREGROUND , BACKGROUND , CFACTORS , GAPFOREGROUND , GAPBACKGROUND , YFOREGROUND , YBACKGROUND and XFOREGROUND inserted before SPREADSHEET . |
---|---|
AKEEP directive |
option CBCVCOVARIANCE inserted before TREATMENTSTRUCTURE ; parameter CVCOVARIANCE inserted before CSSP . |
APERMTEST procedure |
option PLOT inserted before NTIMES ; SAVE parameter has now become an option. |
AYPARALLEL procedure |
option COVARIATE inserted before FACTORIAL . |
DKEEP directive |
options YLOWER and YUPPER moved to come after XUPPER , and new options ZLOWER and ZUPPER inserted between YUPPER and FILE . (This to make the ordering of the X , Y and Z options match that elsewhere; see e.g. D3GRAPH .) |
HGKEEP procedure |
option IGNOREFAILURE inserted before SAVE . |
MAANOVA procedure |
option COVARIATE inserted before FACTORIAL . |
QIBDPROBABILITIES procedure |
options NBACKCROSSES and NSELFINGS inserted before MAPPINGFUNCTION . |
QLDDECAY procedure |
options reordered; SCORES and MAX%MISSING options and R2 parameter added; DEVIANCERATIO and MINLOG10P parameters removed; decay setting of the PLOT option renamed lddecay . |
QMATCH procedure |
options GENSELECTION and MKSELECTION inserted before POPULATIONTYPE . |
QMESTIMATE procedure |
option IDPARENTS inserted before QTLSELECTED . |
QMKDIAGNOSTICS procedure |
PLOIDY option deleted; parameters GENCHECK and MKCHECK inserted before SUMMARY . |
QMVREPLACE procedure |
PLOIDY option deleted |
QSASSOCIATION procedure |
options METHOD and SCORES inserted before THRESHOLD . |
QSESTIMATE procedure |
option IDPARENTS inserted before QTLSELECTED . |
QSIMULATE procedure |
options NBACKCROSSES and NSELFINGS inserted before GENOMELENGTH . |
QUANTILE procedure |
option METHOD inserted before PROPORTION . |
RKEEP directive |
options FITMODEL , FITCONSTANT and FITTYPE inserted before SAVE . |
RLFUNCTIONAL procedure |
extensive redesign, with many new options and parameters; in particular the METHOD parameter is now an option (allowing several methods to be studied at once). |
RQLINEAR procedure |
option CIPROBABILITY moved to come after SEED (as in SVGLM , SVSTRATIFIED , SVTABULATE etc.). |
RQNONLINEAR procedure |
options CIPROBABILITY and MAXCYCLE moved to come after SEED (c.f. ANOVA etc.). |
RQSMOOTH procedure |
option CIPROBABILITY moved to come after SEED . |
VPLOT procedure |
option RMETHOD inserted before INDEX . |
VTCOMPARISONS procedure |
option GROUPS inserted before SAVE ; option VCOVARIANCE inserted before STATISTIC . |