1. Home
  2. MNORMALIZE procedure

MNORMALIZE procedure

Normalizes two-colour microarray data (D.B. Baird).

Options

PRINT = string tokens What to print (summary, slidesummary, monitoring); default summ, slid, moni
PLOT = string tokens What plots to produce (pineffects, roweffects, columneffects, intensityeffects, rowxcoleffects, ma, standardizedma, spatialresiduals); default * i.e. none
METHOD = string token What type of model components to fit (spline, loess); default spli
MODELTERMS = string tokens What model components to fit (pins, rows, columns, intensity, pinxintensity, ar1, rowxcolumn, pinxrow, pinxcolumn); default pins, rows, colu, inte
DFINTENSITY = scalar Degrees of freedom for intensity cubic spline; default 24
DFROWXCOLUMN = scalar Degrees of freedom for row × col thinplate spline; default 49
POORFLAGS = text or variate Levels of FLAGS that are poor quality spots
BADFLAGS = text or variate Levels of FLAGS that are bad spots
ARRANGEMENT = string token Whether to use trellis or single plots (single, trellis); default trel
WINDOW = scalar Window number for the graphs; default 3
DEVICE = scalar Device number on which to plot the graphs
GRAPHICSFILE = text What graphics filename template to use to save the graphs; default *

Parameters

LOGRATIOS = variates or pointers Log-ratios
INTENSITIES = variates or pointers Spot intensities
SLIDES = factors or texts Slides
PINS = factors Pins
SROWS = factors Rows across whole slide
SCOLUMNS = factors Columns across whole slide
PROWS = factors Rows within pins
PCOLUMNS = factors Columns within pins
FLAGS = factors or pointers Quality flags
CLOGRATIOS = variates or pointers Save corrected log-ratios
SLOGRATIOS = variates or pointers Save standardized log-ratios
SDSMOOTH = variates or pointers Save smoothed deviations
PINEFFECTS = tables Save estimated pin effects
ROWEFFECTS = tables Save estimated row effects
COLEFFECTS = tables Save estimated column effects
INTEFFECTS = variates or pointers Save estimated intensity effects
CLRED = variates or pointers Save corrected log2 red values
CLGREEN = variates or pointers Save corrected log2 green values
VAREXPLAINED = variates Save the variance explained by slide

Description

With large microarrays it is essential to identify sources of variation and correct for them, to allow for robust use of this technology. Through normalization procedures, such variations can be identified and removed to obtain data for follow-on research. The analysis of the microarrays is thus a two-step process: a within-slide analysis aimed at normalization and, if required, standardization; then a between-slide analysis to estimate the differences between targets (or treatments) and evaluate their consistency.

Various techniques have been suggested for normalization, including linear regression, ratio statistics, local smoothing and analysis of variance. The approach in MNORMALIZE is to model the variation associated with spatial and structural components and remove this as noise. Examples of spatial components are the grid layout on the slide (rows × columns), and of structural components are the pins, print order and differential dye responses to binding and scanning. The model can be specified to fit the type of variation found in the particular series of slides. The usual statistical modelling approach is taken where all possible sources of noise are jointly fitted in one model, and the need for each term is assessed using the statistical significance of the reduction in the remaining unexplained variation. Model terms can be added or removed as required. The fitted model then indicates where useful modification of protocols and equipment would help minimize variation in future experiments.

The type of model to use is selected using the METHOD option, with settings:

    spline a mixed model including cubic smoothing splines, fitted with the REML directive; or
    loess regression with the LOESS smoothing function, fitted with the FIT directive.

The terms to include in the models are selected by the MODELTERMS option, with settings:

    pins an effect for each pin on the slide;
    rows an effect for each row on the slide;
    columns an effect for each column on the slide;
    intensity a cubic smoothing spline or Loess curve for spot intensity, with degrees of freedom defined by the DFINTENSITY option (default 24);
    pinxintensity a different linear effects of intensity for each pin;
    ar1 autoregressive model with order 1, separately in row and column directions (REML only);
    rowxcolumn a thin-plate spline (REML only) which fits a smooth surface with row and column interaction, with degrees of freedom defined by the DFROWXCOLUMN option (default 49);
    pinxrow pin-by-row interaction; and
    pinxcolumn pin-by-column interaction.

The log-ratios and spot intensities are supplied by the LOGRATIOS and INTENSITIES parameters. If these are single variates, the SLIDES parameter should supply a factor to index the slides. Alternatively you can supply pointers containing a variate for each slide for these, and the SLIDES parameter may be omitted; alternatively it can supply a text giving a label for each slide.

The slide layout is specified by the parameters PINS, SROWS, SCOLUMNS, PROWS and PCOLUMNS. PINS provides a factor to index the pins. SROWS and SCOLUMNS provide factors to index the rows and columns within the whole slide. PROWS and PCOLUMNS provides factors to index the rows and columns within the pins. If LOGRATIOS is a pointer, the slide layout factors refer to a single slide, and all slides must have a common layout.

The FLAGS parameter supplies a factor giving a quality flag for each spot, which must match the type and length of the LOGRATIOS parameter. The POORFLAGS and BADFLAGS options can then each supply a text or variate, defining levels of FLAGS that indicate poor or bad quality spots. The poor spots are still used for model fitting, but are excluded from the output variates. The bad quality spots are excluded from any analysis.

The CLOGRATIOS parameter can supply a variate or pointer, to save the corrected log-ratios. Similarly, the SLOGRATIOS parameter can save the standardized log-ratios, and SDSMOOTH can save the smoothed deviations. The PINEFFECTS, ROWEFFECTS and COLEFFECTS parameters can save tables containing estimated pin, row and column effects, respectively. The INTEFFECTS parameter can save the estimated intensity effects. The CLRED and CLGREEN parameters can save the corrected log2 red and green values, respectively. If they have already been defined, the output structures specified by CLOGRATIOS, SLOGRATIOS, SDSMOOTH, INTEFFECTS, CLRED and CLGREEN must have the same type as the LOGRATIOS parameter (i.e. variates if LOGRATIOS is a variate, and pointers if LOGRATIOS is a pointer). Finally, the VAREXPLAINED parameter can save a variate with the variance explained by the fitted model on each slide.

The PRINT option controls printed output, and the PLOT option controls what graphs are produced. By default the plots for the slides are displayed in a trellis arrangement, but you can set option ARRANGEMENT=single to display them separately, in single plots. The WINDOW option specifies the window to use for the graphs (by default 3). You can use the DEVICE option to plot to a device other than the screen. The GRAPHICSFILE option then supplies a template for the file names.

Options: PRINT, PLOT, METHOD, MODELTERMS, DFINTENSITY, DFROWXCOLUMN, POORFLAGS, BADFLAGS, ARRANGEMENT, WINDOW, DEVICE, GRAPHICSFILE.

Parameters: LOGRATIOS, INTENSITIES, SLIDES, PINS, SROWS, SCOLUMNS, PROWS, PCOLUMNS, FLAGS, CLOGRATIOS, SLOGRATIOS, SDSMOOTH, PINEFFECTS, ROWEFFECTS, COLEFFECTS, INTEFFECTS, CLRED, CLGREEN, VAREXPLAINED.

Action with RESTRICT

Any restrictions on LOGRATIOS, INTENSITIES, SLIDES, PINS, SROWS, SCOLUMNS, PROWS, PCOLUMNS or FLAGS are removed (and a warning is given).

See also

Procedures: DMADENSITY, FDRBONFERRONI, FDRMIXTURE, MACALCULATE, MAESTIMATE, MAHISTOGRAM, MAPCLUSTER, MAPLOT, MASCLUSTER, MASHADE, MAVOLCANO, MA2CLUSTER.

Commands for: Microarray data.

Example

CAPTION      'MNORMALIZE example'; STYLE=meta
ENQUIRE      CHANNEL=-1; EXIST=check; NAME=\
             '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'
IF check
  SPLOAD     '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'
" Normalize Microarray Data "
  MNORMALIZE [METHOD=spline; PRINT=summary,slidesummary,monitoring;\
             MODELTERMS=pins,rows,columns,intensity,rowxcolumn;\
             PLOT=pineffects,roweffects,columneffects,intensityeffects,\
             rowxcoleffects; ARRANGEMENT=trellis; POORFLAGS=!(-25,-50);\
             BADFLAGS=!(-75,-100); DFINTENSITY=24] LOGRATIOS=logRatio;\
             INTENSITIES=Intensity; SLIDES=Slide; PINS=Block;\
             SROWS=Slide_Row; SCOLUMNS=Slide_Column; PROWS=Row;\
             PCOLUMNS=Column; FLAGS=Flags
ELSE
  CAPTION    'Microarray example datasets have not been installed.'
ENDIF
Updated on March 7, 2019

Was this article helpful?