Analyses a simple REML
variance components model for outliers using a variance shift outlier model (S.J. Welham, F.N. Gumedze & D.B. Baird).
Options
PRINT = string tokens |
Specifies the output to be produced (fdr , outliers ); default fdr , outl |
---|---|
VPRINT = string tokens |
Controls the output from the REML analysis of the baseline model (model , components , effects , means , stratumvariances , monitoring , vcovariance , deviance , Waldtests , missingvalues , covariancemodels ); default mode , comp , Wald , cova |
PLOT = string tokens |
Controls which plots are produced (indexplots , residual ); default inde , resi |
INDEXPLOT = string tokens |
Selects the index plots to produce (omega , sigma2 , tsquared , lrt , method , all ); default meth |
TERM = formula |
Random term to scan for outliers; default is the residual term |
METHOD = string token |
Method for calculating the statistics used to indicate an outlier (full , partial , t ); default t |
THRMETHOD = string token |
Method for obtaining the threshold statistics (approximate , bootstrap ); default appr for METHOD =full and boot otherwise |
NBOOT = scalar |
Number of bootstrap samples to take to form the threshold statistics; default 99 for METHOD =full and 499 otherwise |
FIXED = formula |
Fixed model terms |
RANDOM = formula |
Random model terms |
CONSTANT = string token |
How to treat the constant term (estimate , omit ); default esti |
FACTORIAL = scalar |
Limit on the number of factors or covariates in each fixed term; default 3 |
VCONSTRAINTS = string token |
How to constrain the variance components and the residual variance (none , positive , fixrelative , fixabsolute ); default posi |
INITIAL = variate |
Initial values for the variance components; default 1 |
SEED = scalar |
Seed for random number generation; default 0 continues an existing sequence or, if none, selects a seed automatically |
SAVEITEMS = string tokens |
Selects the items to save (residuals , omega , sigma2 , gamma , tsquared , lrt , fdr , approxthresholds , thresholdstats , outliers , method , all ); default resi , omeg , sigm , meth , fdr , outl |
Parameters
Y = variates |
Response variates |
---|---|
TITLE = texts |
Specifies the title or titles to use for the plots |
SAVE = pointers |
Saves information from the analysis of each y-variate |
Description
VSOM
uses a mixed-model analysis with a variance shift outlier model (VSOM) to search for potential outliers. By default, the VSOM is used to assess the residuals. However, you can set the TERM
option to a random term in the analysis, to assess its effects: i.e. to see whether any of the groups of observations defined by the random term seem to be aberrant. The model defines an extra component of variation for each unit (an individual or a group), in turn, and estimates the extra variance associated with it. The METHOD
option specifies how the extra variance is estimated, with the following settings.
full |
refits the full model with the added variance term for each unit; this can be very time-consuming. |
---|---|
partial |
approximates the change in likelihood by a partial likelihood, where the baseline model parameters are held fixed, and only the extra variance component for each unit is estimated; this is much faster than re-estimating the full model. |
t |
uses the squared t-statistics (i.e. squared standardized residuals) to approximate the change in likelihood (default); this is the fastest approach. |
To assess whether a unit is outside its expected distribution, thresholds are calculated at various levels of significance. The THRMETHOD
option specifies the method to use:
approximate |
uses the asymptotic distribution to calculate the thresholds; and |
---|---|
bootstrap |
uses parametric bootstrap samples, with the variance components in the baseline model, to calculate the thresholds from the percentiles of the order statistics. |
Each bootstrap sample is formed by taking the sum of the fitted fixed effects from the baseline model, together with simulated effects for the random terms in the model. Each random effect is simulated by Normal random numbers, with a mean of zero and the variance that was estimated for that term in the baseline model. The NBOOT
option defines how many random samples to perform; the default is 99
for METHOD
=full
, and 499
otherwise. The SEED
option specifies the seed for the random number generator, used by the GRNORMAL
function to make the bootstrap samples. The default of zero continues the sequence of random numbers from a previous generation or, if this is the first use of the generator in this run of Genstat, it initializes the seed automatically from the computer clock. If you repeat the analysis with the same (non-zero) seed, you will get the same random numbers, and hence the same results.
The FIXED
and RANDOM
options specify the fixed and random terms to be fitted in the analysis and the FACTORIAL
option sets a limit on the number of factors and variates allowed in each fixed term. If neither FIXED
nor RANDOM
is specified, their settings are taken from the most recent VCOMPONENTS
command. Its FACTORIAL
setting is also taken if VCOMPONENTS
is providing the fixed model. A fault is given if neither a fixed nor a random model is supplied. Note that the analysis cannot handle covariance models (which would be specified by the VSTRUCTURE
directive). The VCONSTRAINTS
option specifies constraints on the variance components, using the same settings as the CONSTRAINTS
parameter of VCOMPONENTS
. The CONSTANT
option allows you to omit the constant.
Printed output is controlled by the PRINT
option, with the following settings:
outliers |
prints a summary of the potential outliers, as measured against the threshold statistics, at various levels of significance; and |
---|---|
fdr |
prints the estimated false discovery rates for the potential outliers. |
The false discovery rates (FDR) are estimated from the distribution of p-values calculated with the t-statistics from the asymptotic model. This uses the FDRMIXTURE
procedure, or else the FDRBONFERRONI
procedure if that fails. The FDR estimates the probability that the outlier is generated by noise. If this is small, it is likely that the outlier is genuine. However, if it is larger than 0.5, there is more chance that it was generated by noise. The FDR probabilities do not allow for correlations between the estimates. So, if there are only 2-3 replicates of the fixed terms, these may be too small, and should be interpreted with caution.
The VPRINT
option controls the output from the REML
analysis of the baseline model (as specified by the FIXED
and RANDOM
options). This has the same settings and default as the PRINT
option of REML
.
Graphical output is controlled by the PLOT
option, with the following settings.
residual |
when TERM is set, the DRESIDUALS procedure is used to plot histograms and Normal plots of the specified random effects; when TERM is not set, DRESIDUALS is used to plot histograms and Normal plots of the residuals together with a plot of the residuals against the fitted values. |
---|---|
indexplots |
plots the statistics, selected by the INDEXPLOT option, against their index (i.e. their position in the y-variate). |
For residual
and indexplots
, points are plotted in red if they are greater than their 5% bootstrap threshold, and in purple or green if greater than the 1% or 5% asymptotic thresholds respectively. The index plot also displays reference lines for the order statistics (OS 1, OS 2…) when THRMETHOD
=bootstrap
, or the 5%, 1% and 0.1% and 0.01% asymptotic thresholds when THRMETHOD
=approximate
.
The plots that are produced as components of the index plot can be controlled by the INDEXPLOT
option, with the following settings:
omega |
variance shift as a ratio to the residual variance, |
---|---|
sigma2 |
estimated residual variance under VSOM, |
tsquared |
squared t-statistic, |
lrt |
likelihood ratio test, |
method |
the statistic associated with the setting of the METHOD option, i.e. lrt for full or partial , and tsquared for t (default), and |
all |
all the statistics. |
The Y
parameter specifies the response variate. The TITLE
parameter can supply a text, with either one or three values, to label the graphs. If the text has a single value, this is used to prefix the standard descriptions for the three graphs. If it has three values, these give (in full) the titles for the comparison
, indexplots
, residual
plots, respectively.
The SAVE
parameter can save a pointer containing variates, storing the statistics calculated for each group or individual. The labels of the pointer, and the corresponding statistics, are as follows:
'residuals' |
the standardized residuals, |
---|---|
'omega' |
the variance shift as a ratio to the residual variance, |
'sigma2' |
the estimated residual variance under VSOM, |
'gamma' |
the estimated variance component for TERM under VSOM, |
'tsquared' |
the squared t-statistic, |
'LRT' |
the partial likelihood ratio test if THRMETHOD =partial or the full likelihood ratio test otherwise, |
'method' |
the statistic associated with the setting of the METHOD option (lrt for full or partial , and tsquared for t ), |
'FDR' |
the false discovery rate base on the t-statistics, |
'approxthresholds' |
the approximate thresholds used to indicate significant departures, |
'thresholdstats' |
the 95 percentiles of the order statistics from the bootstrap samples in decreasing order, and |
'outliers' |
the unit numbers of outliers above the thresholds. |
The SAVEITEMS
option controls which of the above items are saved.
Options: PRINT
, VPRINT
, PLOT, INDEXPLOT
, RTERM
, METHOD
, THRMETHOD
, NBOOT
, FIXED
, RANDOM
, CONSTANT
, FACTORIAL
, VCONSTRAINTS
, INITIAL
, SEED
, SAVEITEMS
.
Parameters: Y
, TITLE
, SAVE
.
Method
VSOM
uses the method of Gumedze et al. (2010).
Action with RESTRICT
The Y
parameter can be restricted. All output estimates will then be based only on the unrestricted units.
Reference
Gumedze, F.N., Welham, S.J., Gogel, B.J. & Thompson, R. (2010). A variance shift model for detection of outliers in the linear mixed model. Computational Statistics and Data Analysis, 54, 2128-2144.
See also
Directives: REML
, VCOMPONENTS
, VSTRUCTURE
.
Procedure: VCHECK
, VRCHECK
, VPLOT
, VDFIELDRESIDUALS
, VFRESIDUALS
, DRESIDUALS
. FDRBONFERRONI
, FDRMIXTURE
.
Commands for: REML analysis of linear mixed models.
Example
CAPTION 'VSOM examples',\ !T('Cambridge Filter data (Wagner & Thaggard 1979):',\ 'Nicotine extracted from pads at 14 laboratories'); STYLE=meta,plain SPLOAD [PRINT=*] '%EXAMPLES%/CambridgeFilterData.gsh' "Check residual term - individual samples for outliers" VSOM [METHOD=t; FIXED=Sample; RANDOM=Laboratory; SEED=7643] Nicotine;\ TITLE='Cambridge Filter data' "Check laboratory term for outliers" VSOM [METHOD=full; FIXED=Sample; RANDOM=Laboratory; TERM=Laboratory]\ Nicotine; TITLE='Cambridge filter data by laboratory' CAPTION 'Slate Hall spring wheat trial (Kempton & Fox 1997)'; STYLE=plain SPLOAD [PRINT=*] '%DATA%/SlateHall.gsh' "Check residual term - individual plots for outliers" VSOM [PRINT=; VPRINT=*; PLOT=#; INDEXPLOT=all; FIXED=variety;\ RANDOM=fieldrow*fieldcolumn; METHOD=Partial; NBOOT=199;\ SAVEITEMS=residuals,omega,fdr] yield; TITLE='Slate Hall'; SAVE=results "Test fieldcolumn effects for outliers" VSOM [FIXED=variety; RANDOM=fieldrow*fieldcolumn; TERM=fieldcolumn]\ yield; TITLE='Slate Hall by field column'