MAPCLUSTER procedure

Clusters probes or genes with microarray data (D.B. Baird).

Options

`PRINT` = string tokens	What to print (`cluster`, `groups`, `summary`); default `clus`
`PLOT` = string tokens	What to plot (`dendrogram`, `groups`, `meangroups`); default `dend`, `grou`
`METHOD` = string token	Type of clustering to use (`hierarchical`, `kmeans`); default `hier`
`DMETHOD` = string token	Distance method to use for hierarchical clustering (`euclidean`, `cityblock`); default `eucl`
`LMETHOD` = string token	What type of link to use in hierarchal clustering (`singlelink`, `nearestneighbour`, `completelink`, `furthestneighbour`, `averagelink`, `mediansort`, `groupaverage`); default `aver`
`CRITERION` = string token	Criterion to use in forming groups when `LMETHOD=kmeans` (`sums`, `predictive`, `within`, `Mahalanobis`); default `sums`
`NGROUPS` = scalar	Number of groups to form when `LMETHOD=kmeans`
`GTHRESHOLD` = scalar	Grouping threshold for forming groups from the dendrogram; default `*`
`PERCENT` = scalar	Percentage of the probes/genes to use; default 100
`DTITLE` = text	Title for the dendrogram
`GTITLE` = text	Title for the groups plot
`ARRANGEMENT` = string token	Whether to use a trellis or single plot (`single`, `trellis`); default `trel`
`WINDOW` = scalar	Window number for the graphs; default 3
`DEVICE` = scalar	Device number on which to plot the graphs
`GRAPHICSFILE` = text	What graphics filename template to use to save the graphs; default `*`
`SPREADSHEET` = string token	What results to put in spreadsheets (`top%probes`); default `*` i.e. none

Parameters

`DATA` = variates or pointers	Data values (i.e. log-ratios)
`SLIDES` = factors, texts or variates	Identifies the slides
`PROBES` = factors, texts or variates	Identifies the probes or genes
`SIMILARITY` = symmetric matrices	Saves the pair-wise similarities between probes or genes when `METHOD=hier`
`GROUPS` = factors	Saves the group membership for each probe
`AMALGAMATIONS` = matrices	Saves the probe or gene amalgamation data when `METHOD=hier`

Description

MAPCLUSTER clusters probes (which may be thought of as representing genes) together on the similarity of their responses over a number of slides or target effects. The METHOD option specifies whether the clustering is hierarchical, or non-hierarchical using the k-means algorithm. A range of clustering criteria are available for each method (options DMETHOD, LMETHOD and CRITERION). The probes are grouped together so that the responses of each group are similar, with the groups as distinct as possible. For the hierarchical clustering, the allocation to groups is specified by using the GTHRESHOLD option to provide a threshold for the levels of similarity within a group. The dendrogram is then cut at this level, generating an unknown number of groups. For the k-means algorithm, the number of groups must be specified using the NGROUPS option. The group membership can be saved by the GROUPS parameter.

The log-ratios are supplied by the DATA parameter. If these are in a single variate, the SLIDE parameter should supply a factor to index the slides, and the PROBES parameter should index the probes or genes. Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES factor is that for a single slide, and all slides must have a common layout.

The PLOT option allows you to plot a dendrogram for the hierarchical cluster analyses, but for a large number of probes this is less useful as individual probes cannot be read. The responses of each probe across the targets/slides can also be plotted in a shade plot, but for large numbers of probes this is slow, in which case the mean response for each group can be plotted instead. A spreadsheet containing the grouped data can also be saved using the SPREADSHEET option.

With large numbers of probes, the limit of RAM can be quickly reached, so option PERCENT can be set so that only cluster probes with the largest mean absolute responses are clustered.

By default the plots for the groups are displayed in a trellis arrangement, but you can set option ARRANGEMENT=single to display them separately, in single plots. The DTITLE and GTITLE options can supply titles for the dendrogram and groups plot, respectively, and the WINDOW option specifies the window to use (by default 3). You can use the DEVICE option to plot to a device other than the screen. The GRAPHICSFILE option specifies then supplies a template for the file names.

Options: PRINT, PLOT, METHOD, DMETHOD, LMETHOD, CRITERION, NGROUPS, GTHRESHOLD, PERCENT, DTITLE, GTITLE, ARRANGEMENT, WINDOW, DEVICE, GRAPHICSFILE, SPREADSHEET.

Parameters: DATA, SLIDES, PROBES, SIMILARITY, GROUPS, AMALGAMATIONS.

Action with `RESTRICT`

Any restrictions on the DATA variates are removed.

Example

CAPTION      'MAPCLUSTER example'; STYLE=meta
ENQUIRE      CHANNEL=-1; EXIST=check; NAME=\
             '%GENDIR%/Data/Microarrays/ApoAIKnockOutStacked.GSH'
IF check
  SPLOAD     '%GENDIR%/Data/Microarrays/ApoAIKnockOutStacked.GSH'
  " Cluster top 5% Probes from APO Mouse Knock-out Data."
  MAPCLUSTER [PRINT=summary; PLOT=dendrogram,groups,meangroups;\
              METHOD=kmeans; CRITERION=Sums; NGROUPS=16; PERCENT=5;\
              ARRANGEMENT=trellis] DATA=cLogRatio; PROBES=NAME; SLIDES=Slide
ELSE
  CAPTION    'Microarray example datasets have not been installed.'
ENDIF

Updated on June 19, 2019

Was this article helpful?

Yes No