Does regressions for single-channel microarray data (P. Brain, R.W. Payne & D.B. Baird).

### Options

`PRINT` = string tokens |
Controls printed output (`model` , `summary` ); default `*` i.e. none |
---|---|

`TERMS` = formula |
Defines the regression model over the slides |

`WEIGHTS` = variate |
Weights for the regression; default 1 |

`OFFSET` = variate |
Offset; default `*` i.e. none |

`CONSTANT` = string token |
How to treat the constant (`estimate` , `omit` ); default `esti` |

`FACTORIAL` = scalar |
Limit for expansion of model terms; default 3 |

`FULL` = string token |
Whether to assign all possible parameters to factors and interactions (`yes` , `no` ); default `no` |

`POOL` = string token |
Whether to pool the information on each term in the analysis of variance (`yes` , `no` ); default `no` |

`RMETHOD` = string token |
Type of residuals to form (`deviance` , `Pearson` , `simple` ); default `devi` |

`SPREADSHEET` = string tokens |
What results to save in a book of spreadsheets (`aov` , `residuals` , `fittedvalues` , `estimates` , `se` , `testimates` , `prestimates` ); default `*` i.e. none |

### Parameters

`Y` = variates or pointers |
Y-values for each set of analyses |
---|---|

`PROBES` = factors or texts |
Defines the probe information for each analysis |

`SLIDES` = factors or texts |
Defines the slide information for each analysis |

`CHECK= ` texts or variates |
Slide ID’s that can be compared with the labels or levels of the `SLIDES` factor to ensure that the slide order is correct in each analysis |

`IDS` = texts |
Saves the probes names that have been generated to label the rows of the output structures from each analysis |

`RESIDUALS` = matrices |
Saves residuals from each set of analyses |

`FITTEDVALUES` = matrices |
Saves fitted values from each set of analyses |

`ESTIMATES` = matrices |
Saves estimates from each set of analyses |

`SE` = matrices |
Saves s.e.’s of estimates |

`TESTIMATES` = matrices |
Saves t-statistics of estimates |

`PRESTIMATES` = matrices |
Saves t-probabilities of estimates |

`DF` = pointers |
Saves degrees of freedom for the model terms or variates in each analysis of variance |

`SS` = pointers or variates |
Saves sums of squares for the model terms in each analysis of variance |

`MS` = pointers or variates |
Saves mean squares for the model terms in each analysis of variance |

`RDF` = variates |
Saves degrees of freedom from the “residual” lines in each analysis of variance |

`RSS` = variates |
Saves sums of squares from the “residual” lines |

`RMS` = variates |
Saves mean squares from the “residual” lines |

`TDF` = variates |
Saves degrees of freedom from the “total” lines in each analysis of variance |

`TSS` = variates |
Saves sums of squares from the “total” lines |

`TMS` = variates |
Saves mean squares from the “total” lines |

`VR` = pointers or variates |
Saves variance ratios for the model terms in each analysis of variance |

`PRVR` = pointers or variates |
Saves probabilities of the variance ratios |

### Description

Procedure `MAREGRESSION`

does regression analyses for microarray experiments with single-channel data. The experiment is assumed to consist of several slides, each of which represents a unit of the design. The model for the regressions is specified by the `TERMS`

, `WEIGHTS`

, `OFFSET`

, `CONSTANT`

, `FACTORIAL`

and `FULL`

options, which operate exactly as in ordinary regression (see the `MODEL`

, `TERMS`

and `FIT`

directives). The lengths of the factors and variates in the model should be the same as the number of slides (and `MAREGRESSION`

will give a failure diagnostic if this is not so).

Each slide contains data on a (large) number of probes or genes. `MAREGRESSION`

does a between-slide analysis of the data on each probe. So, it uses the mean value for any probe observations that are replicated within a slide, and prints a warning if the replication of any probe differs from slide to slide. The data from the slides are specified by the `Y`

, `PROBES`

and `SLIDES`

parameters, and can be in either a stacked or an unstacked representation. With stacked data, the observations from all the slides are supplied by the `Y`

parameter in a single variate, the `SLIDES`

factor indicates the slide on which each observation was made, and the `PROBES`

factor specifies the probe. With unstacked data, the `Y`

parameter supplies a pointer with a variate for each slide. The `PROBES`

factor or text specifies the probes (which must be in the same order on every slide). The `SLIDES`

factor can be omitted, or it can supply a text defining a label for each slide. The `CHECK`

parameter can supply a text or variate to be compared with the labels or levels of the `SLIDES`

factor, to verify that the slides have been specified in the correct order.

The `RESIDUALS`

and `FITTEDVALUES`

parameters allow you to save the residuals and fitted values from the regressions. These are defined as matrices, with a row for each probe, and a column for each slide. The `RMETHOD`

option indicates what sort of residual to form, as in the other Genstat regression commands. By default, standardized residuals are formed, but you can set `RMETHOD=simple`

to form simple residuals instead.

The `ESTIMATES`

, `SE`

, `TESTESTIMATES`

and `PRESTIMATES`

parameters save the estimates, standard errors, t-statistics and t-probabilities for the parameters in the regression model. These are defined as matrices, with a row for each probe, and a column for each parameter.

The `DF`

, `SS`

, `MS`

, `RDF`

, `RSS`

, `RMS`

, `TDF`

, `TSS`

, `TMS`

, `VR`

and `PRVR`

parameters store information from the analysis of variance table. (`DF`

, `SS`

, `MS`

, `VR`

and `PRVR`

are from the “regression” line, `RDF`

, `RSS`

and `RMS`

are from the “residual” line, and `TDF`

, `TSS`

and `TMS`

are from the “total” line.) With the default setting `no`

of the `POOL`

option each of these is a pointer containing a variate for each term in the `TERMS`

formula. The variates each have a unit for every probe. Alternatively, if you set `POOL=yes`

, the parameters each have a single variate, with the values pooled over the terms.

Printed output is controlled by the `PRINT`

option, with settings:

`model` |
for a description of the regression model, and |
---|---|

`summary` |
for a summary of the significance levels found over the probes for each parameter in the model. |

The `SPREADSHEET`

option allows you to save the various output components in spreadsheets.

### Method

The analyses are performed by the `FIT`

directive and by matrix calculations.

### Action with `RESTRICT`

If any of the y-variates is restricted, the analysis will involve only the units not excluded by the restriction.

### See also

Procedures: `AFFYMETRIX`

, `FDRBONFERRONI`

, `FDRMIXTURE`

, `MAANOVA`

, `MABGCORRECT`

, `MAEBAYES`

, `MARMA`

, `MAROBUSTMEANS`

, `MAVDIFFERENCE`

, `MAVOLCANO`

, `QNORMALIZE`

, `RYPARALLEL`

.

Commands for: Microarray data.

### Example

CAPTION 'MAREGRESSION example','Analysis of 9 Arabidopis slides';\ STYLE=meta,plain ENQUIRE CHANNEL=4(-1); EXIST=check[1...2]; NAME=\ '%GENDIR%/Data/Microarrays/Hyb-Expressions.gsh',\ '%GENDIR%/Data/Microarrays/HybFiles.GSH' IF VSUM(check).EQ.2 SPLOAD '%GENDIR%/Data/Microarrays/Hyb-Expressions.gsh' SPLOAD '%GENDIR%/Data/Microarrays/HybFiles.GSH' " Regression of one-channel microarray data " MAREGRESS [PRINT=model,summary; FACTORIAL=3; TERMS=Target;\ "SPREADSHEET=aov,residuals,fittedvalues,estimates,se,\ testimates,prestimates"]\ Y=Expression; SLIDES=Slides; PROBES=Probes; CHECK=FileName;\ IDS=IDProbes; RESIDUALS=residuals; FITTEDVALUES=fitted;\ ESTIMATES=estimates; TESTIMATES=tstatistic; PRESTIMATES=tprob;\ SE=se; DF=df; SS=ss; MS=ms; RDF=rdf; RSS=rss; RMS=rms;\ TDF=tdf; TSS=tss; TMS=tms; VR=vr; PRVR=prvr ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF