1. Home
  2. MAREGRESSION procedure

MAREGRESSION procedure

Does regressions for single-channel microarray data (P. Brain, R.W. Payne & D.B. Baird).

Options

PRINT = string tokens Controls printed output (model, summary); default * i.e. none
TERMS = formula Defines the regression model over the slides
WEIGHTS = variate Weights for the regression; default 1
OFFSET = variate Offset; default * i.e. none
CONSTANT = string token How to treat the constant (estimate, omit); default esti
FACTORIAL = scalar Limit for expansion of model terms; default 3
FULL = string token Whether to assign all possible parameters to factors and interactions (yes, no); default no
POOL = string token Whether to pool the information on each term in the analysis of variance (yes, no); default no
RMETHOD = string token Type of residuals to form (deviance, Pearson, simple); default devi
SPREADSHEET = string tokens What results to save in a book of spreadsheets (aov, residuals, fittedvalues, estimates, se, testimates, prestimates); default * i.e. none

Parameters

Y = variates or pointers Y-values for each set of analyses
PROBES = factors or texts Defines the probe information for each analysis
SLIDES = factors or texts Defines the slide information for each analysis
CHECK= texts or variates Slide ID’s that can be compared with the labels or levels of the SLIDES factor to ensure that the slide order is correct in each analysis
IDS = texts Saves the probes names that have been generated to label the rows of the output structures from each analysis
RESIDUALS = matrices Saves residuals from each set of analyses
FITTEDVALUES = matrices Saves fitted values from each set of analyses
ESTIMATES = matrices Saves estimates from each set of analyses
SE = matrices Saves s.e.’s of estimates
TESTIMATES = matrices Saves t-statistics of estimates
PRESTIMATES = matrices Saves t-probabilities of estimates
DF = pointers Saves degrees of freedom for the model terms or variates in each analysis of variance
SS = pointers or variates Saves sums of squares for the model terms in each analysis of variance
MS = pointers or variates Saves mean squares for the model terms in each analysis of variance
RDF = variates Saves degrees of freedom from the “residual” lines in each analysis of variance
RSS = variates Saves sums of squares from the “residual” lines
RMS = variates Saves mean squares from the “residual” lines
TDF = variates Saves degrees of freedom from the “total” lines in each analysis of variance
TSS = variates Saves sums of squares from the “total” lines
TMS = variates Saves mean squares from the “total” lines
VR = pointers or variates Saves variance ratios for the model terms in each analysis of variance
PRVR = pointers or variates Saves probabilities of the variance ratios

Description

Procedure MAREGRESSION does regression analyses for microarray experiments with single-channel data. The experiment is assumed to consist of several slides, each of which represents a unit of the design. The model for the regressions is specified by the TERMS, WEIGHTS, OFFSET, CONSTANT, FACTORIAL and FULL options, which operate exactly as in ordinary regression (see the MODEL, TERMS and FIT directives). The lengths of the factors and variates in the model should be the same as the number of slides (and MAREGRESSION will give a failure diagnostic if this is not so).

Each slide contains data on a (large) number of probes or genes. MAREGRESSION does a between-slide analysis of the data on each probe. So, it uses the mean value for any probe observations that are replicated within a slide, and prints a warning if the replication of any probe differs from slide to slide. The data from the slides are specified by the Y, PROBES and SLIDES parameters, and can be in either a stacked or an unstacked representation. With stacked data, the observations from all the slides are supplied by the Y parameter in a single variate, the SLIDES factor indicates the slide on which each observation was made, and the PROBES factor specifies the probe. With unstacked data, the Y parameter supplies a pointer with a variate for each slide. The PROBES factor or text specifies the probes (which must be in the same order on every slide). The SLIDES factor can be omitted, or it can supply a text defining a label for each slide. The CHECK parameter can supply a text or variate to be compared with the labels or levels of the SLIDES factor, to verify that the slides have been specified in the correct order.

The RESIDUALS and FITTEDVALUES parameters allow you to save the residuals and fitted values from the regressions. These are defined as matrices, with a row for each probe, and a column for each slide. The RMETHOD option indicates what sort of residual to form, as in the other Genstat regression commands. By default, standardized residuals are formed, but you can set RMETHOD=simple to form simple residuals instead.

The ESTIMATES, SE, TESTESTIMATES and PRESTIMATES parameters save the estimates, standard errors, t-statistics and t-probabilities for the parameters in the regression model. These are defined as matrices, with a row for each probe, and a column for each parameter.

The DF, SS, MS, RDF, RSS, RMS, TDF, TSS, TMS, VR and PRVR parameters store information from the analysis of variance table. (DF, SS, MS, VR and PRVR are from the “regression” line, RDF, RSS and RMS are from the “residual” line, and TDF, TSS and TMS are from the “total” line.) With the default setting no of the POOL option each of these is a pointer containing a variate for each term in the TERMS formula. The variates each have a unit for every probe. Alternatively, if you set POOL=yes, the parameters each have a single variate, with the values pooled over the terms.

Printed output is controlled by the PRINT option, with settings:

    model for a description of the regression model, and
    summary for a summary of the significance levels found over the probes for each parameter in the model.

The SPREADSHEET option allows you to save the various output components in spreadsheets.

Options: PRINT, TERMS, WEIGHTS, OFFSET, CONSTANT, FACTORIAL, FULL, RMETHOD, SPREADSHEET.

Parameters: Y, PROBES, SLIDES, CHECK, IDS, RESIDUALS, FITTEDVALUES, ESTIMATES, SE, TESTIMATES, PRESTIMATES, DF, SS, MS, RDF, RSS, RMS, TDF, TSS, TMS, VR, PRVR.

Method

The analyses are performed by the FIT directive and by matrix calculations.

Action with RESTRICT

If any of the y-variates is restricted, the analysis will involve only the units not excluded by the restriction.

See also

Procedures: AFFYMETRIX, FDRBONFERRONI, FDRMIXTURE, MAANOVA, MABGCORRECT, MAEBAYES, MARMA, MAROBUSTMEANS, MAVDIFFERENCE, MAVOLCANO, QNORMALIZE, RYPARALLEL.

Commands for: Microarray data.

Example

CAPTION   'MAREGRESSION example','Analysis of 9 Arabidopis slides';\
          STYLE=meta,plain
ENQUIRE   CHANNEL=4(-1); EXIST=check[1...2]; NAME=\
          '%GENDIR%/Data/Microarrays/Hyb-Expressions.gsh',\
          '%GENDIR%/Data/Microarrays/HybFiles.GSH'
IF VSUM(check).EQ.2
  SPLOAD  '%GENDIR%/Data/Microarrays/Hyb-Expressions.gsh'
  SPLOAD  '%GENDIR%/Data/Microarrays/HybFiles.GSH'
  " Regression of one-channel microarray data "
  MAREGRESS [PRINT=model,summary; FACTORIAL=3; TERMS=Target;\
          "SPREADSHEET=aov,residuals,fittedvalues,estimates,se,\
          testimates,prestimates"]\
          Y=Expression; SLIDES=Slides; PROBES=Probes; CHECK=FileName;\
          IDS=IDProbes; RESIDUALS=residuals; FITTEDVALUES=fitted;\
          ESTIMATES=estimates; TESTIMATES=tstatistic; PRESTIMATES=tprob;\
          SE=se; DF=df; SS=ss; MS=ms; RDF=rdf; RSS=rss; RMS=rms;\
          TDF=tdf; TSS=tss; TMS=tms; VR=vr; PRVR=prvr
ELSE
  CAPTION 'Microarray example datasets have not been installed.'
ENDIF
Updated on June 19, 2019

Was this article helpful?