MASCLUSTER procedure

Clusters microarray slides (D.B. Baird).

Options

`PRINT` = string tokens	What to print (`cluster`, `pco`, `correlations`, `distances`); default `clus`, `pco`, `corr`, `dist`
`PLOT` = string tokens	What to plot (`dendrogram`, `mst`); default `dend`, `mst`
`DMETHOD` = string token	What distance method to use to form the similarity matrix (`correlation`, `euclidean`, `cityblock`); default `corr`
`PERCENT` = scalar	Percentage of the probes/genes to use to calculate correlations; default 100
`DTITLE` = text	Title for the dendrogram
`MTITLE` = text	Title for the minimum spanning tree
`WINDOW` = scalar	Window number for the graphs; default 3
`DEVICE` = scalar	Device number on which to plot the graphs
`GRAPHICSFILE` = text	What graphics filename template to use to save the graphs; default `*`

Parameters

`DATA` = variates or pointers	Data values (i.e. log-ratios)
`SLIDES` = factors, texts or variates	Identifies the slides
`PROBES` = factors, texts or variates	Identifies the probes or genes
`CORRELATION` = symmetric matrices	Saves the correlation matrix
`DISTANCE` = symmetric matrices	Saves the distance matrix

Description

MASCLUSTER clusters microarray slides (or targets) together on the similarity of their responses over a number of probes or genes. The slides are grouped together so that the pattern of responses over the probes/genes are similar, with the groups as distinct as possible.

The DMETHOD option specifies the distance method to use to form the similarity matrix: either correlation (default), euclidean, or cityblock.

With large numbers of probes or genes, many may be non-informative, only being subject to random variation. So the PERCENT option controls the percentage of the probes to use: if PERCENT is less than the default 100, MASCLUSTER uses only the top PERCENT of probes according to their mean absolut response.

The log-ratios are supplied by the DATA parameter. If these are in a single variate, the SLIDE parameter should supply a factor to index the slides, and the PROBES parameter should index the probes or genes. Alternatively you can supply a pointer containing a variate for each slide. The SLIDES factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES factor is that for a single slide, and all slides must have a common layout.

The DTITLE and MTITLE options can supply titles for the plots of the dendrogram and minimum spanning tree, respectively, and the WINDOW option specifies the window to use (by default 3). You can use the DEVICE option to plot to a device other than the screen. The GRAPHICSFILE option specifies then supplies a template for the file names.

Options: PRINT, PLOT, DMETHOD, PERCENT, DTITLE, MTITLE, WINDOW, DEVICE, GRAPHICSFILE.

Parameters: DATA, SLIDES, PROBES, CORRELATION, DISTANCE.

Action with `RESTRICT`

Any restrictions on the DATA variates are removed.

Example

CAPTION      'MASCLUSTER example'; STYLE=meta
ENQUIRE      CHANNEL=-1; EXIST=check; NAME=\
             '%GENDIR%/Data/Microarrays/ApoAIKnockOutStacked.GSH'
IF check
  SPLOAD     '%GENDIR%/Data/Microarrays/ApoAIKnockOutStacked.GSH'
  " Cluster Slides from APO Mouse Knock-out Data."
  MASCLUSTER [PRINT=correlations,cluster; PLOT=dendrogram;\
             DMETHOD=correlation; PERCENT=10] DATA=cLogRatio;\
             SLIDES=Slide; PROBES=NAME
ELSE
  CAPTION    'Microarray example datasets have not been installed.'
ENDIF

Updated on March 7, 2019

Was this article helpful?

Yes No