Normalizes two-colour microarray data (D.B. Baird).
Options
PRINT = string tokens |
What to print (summary , slidesummary , monitoring ); default summ , slid , moni |
---|---|
PLOT = string tokens |
What plots to produce (pineffects , roweffects , columneffects , intensityeffects , rowxcoleffects , ma , standardizedma , spatialresiduals ); default * i.e. none |
METHOD = string token |
What type of model components to fit (spline , loess ); default spli |
MODELTERMS = string tokens |
What model components to fit (pins , rows , columns , intensity , pinxintensity , ar1 , rowxcolumn , pinxrow , pinxcolumn ); default pins , rows , colu , inte |
DFINTENSITY = scalar |
Degrees of freedom for intensity cubic spline; default 24 |
DFROWXCOLUMN = scalar |
Degrees of freedom for row × col thinplate spline; default 49 |
POORFLAGS = text or variate |
Levels of FLAGS that are poor quality spots |
BADFLAGS = text or variate |
Levels of FLAGS that are bad spots |
ARRANGEMENT = string token |
Whether to use trellis or single plots (single , trellis ); default trel |
WINDOW = scalar |
Window number for the graphs; default 3 |
DEVICE = scalar |
Device number on which to plot the graphs |
GRAPHICSFILE = text |
What graphics filename template to use to save the graphs; default * |
Parameters
LOGRATIOS = variates or pointers |
Log-ratios |
---|---|
INTENSITIES = variates or pointers |
Spot intensities |
SLIDES = factors or texts |
Slides |
PINS = factors |
Pins |
SROWS = factors |
Rows across whole slide |
SCOLUMNS = factors |
Columns across whole slide |
PROWS = factors |
Rows within pins |
PCOLUMNS = factors |
Columns within pins |
FLAGS = factors or pointers |
Quality flags |
CLOGRATIOS = variates or pointers |
Save corrected log-ratios |
SLOGRATIOS = variates or pointers |
Save standardized log-ratios |
SDSMOOTH = variates or pointers |
Save smoothed deviations |
PINEFFECTS = tables |
Save estimated pin effects |
ROWEFFECTS = tables |
Save estimated row effects |
COLEFFECTS = tables |
Save estimated column effects |
INTEFFECTS = variates or pointers |
Save estimated intensity effects |
CLRED = variates or pointers |
Save corrected log2 red values |
CLGREEN = variates or pointers |
Save corrected log2 green values |
VAREXPLAINED = variates |
Save the variance explained by slide |
Description
With large microarrays it is essential to identify sources of variation and correct for them, to allow for robust use of this technology. Through normalization procedures, such variations can be identified and removed to obtain data for follow-on research. The analysis of the microarrays is thus a two-step process: a within-slide analysis aimed at normalization and, if required, standardization; then a between-slide analysis to estimate the differences between targets (or treatments) and evaluate their consistency.
Various techniques have been suggested for normalization, including linear regression, ratio statistics, local smoothing and analysis of variance. The approach in MNORMALIZE
is to model the variation associated with spatial and structural components and remove this as noise. Examples of spatial components are the grid layout on the slide (rows × columns), and of structural components are the pins, print order and differential dye responses to binding and scanning. The model can be specified to fit the type of variation found in the particular series of slides. The usual statistical modelling approach is taken where all possible sources of noise are jointly fitted in one model, and the need for each term is assessed using the statistical significance of the reduction in the remaining unexplained variation. Model terms can be added or removed as required. The fitted model then indicates where useful modification of protocols and equipment would help minimize variation in future experiments.
The type of model to use is selected using the METHOD
option, with settings:
spline |
a mixed model including cubic smoothing splines, fitted with the REML directive; or |
---|---|
loess |
regression with the LOESS smoothing function, fitted with the FIT directive. |
The terms to include in the models are selected by the MODELTERMS
option, with settings:
pins |
an effect for each pin on the slide; |
---|---|
rows |
an effect for each row on the slide; |
columns |
an effect for each column on the slide; |
intensity |
a cubic smoothing spline or Loess curve for spot intensity, with degrees of freedom defined by the DFINTENSITY option (default 24); |
pinxintensity |
a different linear effects of intensity for each pin; |
ar1 |
autoregressive model with order 1, separately in row and column directions (REML only); |
rowxcolumn |
a thin-plate spline (REML only) which fits a smooth surface with row and column interaction, with degrees of freedom defined by the DFROWXCOLUMN option (default 49); |
pinxrow |
pin-by-row interaction; and |
pinxcolumn |
pin-by-column interaction. |
The log-ratios and spot intensities are supplied by the LOGRATIOS
and INTENSITIES
parameters. If these are single variates, the SLIDES
parameter should supply a factor to index the slides. Alternatively you can supply pointers containing a variate for each slide for these, and the SLIDES
parameter may be omitted; alternatively it can supply a text giving a label for each slide.
The slide layout is specified by the parameters PINS
, SROWS
, SCOLUMNS
, PROWS
and PCOLUMNS
. PINS
provides a factor to index the pins. SROWS
and SCOLUMNS
provide factors to index the rows and columns within the whole slide. PROWS
and PCOLUMNS
provides factors to index the rows and columns within the pins. If LOGRATIOS
is a pointer, the slide layout factors refer to a single slide, and all slides must have a common layout.
The FLAGS
parameter supplies a factor giving a quality flag for each spot, which must match the type and length of the LOGRATIOS
parameter. The POORFLAGS
and BADFLAGS
options can then each supply a text or variate, defining levels of FLAGS
that indicate poor or bad quality spots. The poor spots are still used for model fitting, but are excluded from the output variates. The bad quality spots are excluded from any analysis.
The CLOGRATIOS
parameter can supply a variate or pointer, to save the corrected log-ratios. Similarly, the SLOGRATIOS
parameter can save the standardized log-ratios, and SDSMOOTH
can save the smoothed deviations. The PINEFFECTS
, ROWEFFECTS
and COLEFFECTS
parameters can save tables containing estimated pin, row and column effects, respectively. The INTEFFECTS
parameter can save the estimated intensity effects. The CLRED
and CLGREEN
parameters can save the corrected log2 red and green values, respectively. If they have already been defined, the output structures specified by CLOGRATIOS
, SLOGRATIOS
, SDSMOOTH
, INTEFFECTS
, CLRED
and CLGREEN
must have the same type as the LOGRATIOS
parameter (i.e. variates if LOGRATIOS
is a variate, and pointers if LOGRATIOS
is a pointer). Finally, the VAREXPLAINED
parameter can save a variate with the variance explained by the fitted model on each slide.
The PRINT
option controls printed output, and the PLOT
option controls what graphs are produced. By default the plots for the slides are displayed in a trellis arrangement, but you can set option ARRANGEMENT=single
to display them separately, in single plots. The WINDOW
option specifies the window to use for the graphs (by default 3). You can use the DEVICE
option to plot to a device other than the screen. The GRAPHICSFILE
option then supplies a template for the file names.
Options: PRINT
, PLOT
, METHOD
, MODELTERMS
, DFINTENSITY
, DFROWXCOLUMN
, POORFLAGS
, BADFLAGS
, ARRANGEMENT
, WINDOW
, DEVICE
, GRAPHICSFILE
.
Parameters: LOGRATIOS
, INTENSITIES
, SLIDES
, PINS
, SROWS
, SCOLUMNS
, PROWS
, PCOLUMNS
, FLAGS
, CLOGRATIOS
, SLOGRATIOS
, SDSMOOTH
, PINEFFECTS
, ROWEFFECTS
, COLEFFECTS
, INTEFFECTS
, CLRED
, CLGREEN
, VAREXPLAINED
.
Action with RESTRICT
Any restrictions on LOGRATIOS
, INTENSITIES
, SLIDES
, PINS
, SROWS
, SCOLUMNS
, PROWS
, PCOLUMNS
or FLAGS
are removed (and a warning is given).
See also
Procedures: DMADENSITY
, FDRBONFERRONI
, FDRMIXTURE
, MACALCULATE
, MAESTIMATE
, MAHISTOGRAM
, MAPCLUSTER
, MAPLOT
, MASCLUSTER
, MASHADE
, MAVOLCANO
, MA2CLUSTER
.
Commands for: Microarray data.
Example
CAPTION 'MNORMALIZE example'; STYLE=meta ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' IF check SPLOAD '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' " Normalize Microarray Data " MNORMALIZE [METHOD=spline; PRINT=summary,slidesummary,monitoring;\ MODELTERMS=pins,rows,columns,intensity,rowxcolumn;\ PLOT=pineffects,roweffects,columneffects,intensityeffects,\ rowxcoleffects; ARRANGEMENT=trellis; POORFLAGS=!(-25,-50);\ BADFLAGS=!(-75,-100); DFINTENSITY=24] LOGRATIOS=logRatio;\ INTENSITIES=Intensity; SLIDES=Slide; PINS=Block;\ SROWS=Slide_Row; SCOLUMNS=Slide_Column; PROWS=Row;\ PCOLUMNS=Column; FLAGS=Flags ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF