Performs a two-way clustering of microarray data by probes (or genes) and slides (D.B. Baird).
Options
PRINT = string tokens |
What to print (cluster , groups , summary ); default clus |
---|---|
PLOT = string tokens |
What to plot (dendrogram , shade , meanshade ); default dend , shad |
METHOD = string token |
Type of clustering to use (hierarchical , kmeans ); default hier |
DMETHOD = string token |
Distance method to use for hierarchical clustering (euclidean , cityblock ); default eucl |
LMETHOD = string token |
What type of link to use in hierarchal clustering (singlelink , nearestneighbour , completelink , furthestneighbour , averagelink , mediansort , groupaverage ); default aver |
CRITERION = string token |
Criterion to use in forming groups when LMETHOD=kmeans (sums , predictive , within , Mahalanobis ); default sums |
PNGROUPS = scalar |
Number of probe groups to form when LMETHOD=kmeans |
SNGROUPS = scalar |
Number of target (slide) groups to form when LMETHOD=kmeans |
GTHRESHOLD = scalar |
Grouping threshold for forming probe groups from the dendrogram; default * |
SGTHRESHOLD = scalar |
Grouping threshold for forming target (slide) groups from the dendrogram; default * |
MINOBSERVATIONS = scalar |
Smallest number of observations before probes are dropped; default * |
PERCENT = scalar |
Percentage of the probes/genes to use; default 100 |
STANDARDIZE = string token |
allows you to centre the values by slide and probe (centre ); default * i.e. no centring |
COLOURS = text, scalar or variate |
Colours to use for shade plot; default !t(blue,red) |
DTITLE = text |
Title for the dendrogram |
STITLE = text |
Title for the shade plot |
WINDOW = scalar |
Window number for the graphs; default 3 |
DEVICE = scalar |
Device number on which to plot the graphs |
GRAPHICSFILE = text |
What graphics filename template to use to save the graphs; default * |
SPREADSHEET = string token |
What results to put in spreadsheets (top%probes ); default * i.e. none |
Parameters
DATA = variates or pointers |
Data values (i.e. log-ratios) |
---|---|
SLIDES = factors, texts or variates |
Identifies the slides |
PROBES = factors, texts or variates |
Identifies the probes or genes |
GMEANS = matrices |
Saves the tabulation of the data by probe groups and target groups, as a two-way matrix |
PGROUPS = factors |
Saves the group membership for each probe (or gene) |
SGROUPS = factors |
Saves the group membership for each slide (or target) |
PAMALGAMATIONS = matrices |
Saves the probe (or gene) amalgamation data when METHOD=hier |
SAMALGAMATIONS = matrices |
Saves the slide (or target) amalgamation data when METHOD=hier |
Description
MA2CLUSTER
perform a two-way clustering of probes (which may be thought of as representing genes) and slides (or target) effects. The METHOD
option specifies whether the clustering is hierarchical, or non-hierarchical using the k-means algorithm. A range of clustering criteria are available for each method (option DMETHOD
, LMETHOD
and CRITERION
). The probes are grouped together so that the responses of each group are similar, with the groups as distinct as possible. For the hierarchical clustering, the allocation to groups is specified by using the PGTHRESHOLD
and SGTHRESHOLD
option to provide a threshold for the levels of similarity within a group when clustering the probes and slides, respectively. The dendrograms are then cut at these levels, generating an unknown number of groups. For the k-means algorithm, the number of groups must be specified using the PNGROUPS
and SNGROUPS
options. The group memberships can be saved by the PGROUPS
and SGROUPS
parameters. You can set option STANDARDIZE=centre
to centre the log-ratios by probe and slide before the clustering.
The log-ratios are supplied by the DATA
parameter. If these are in a single variate, the SLIDE
parameter should supply a factor to index the slides, and the PROBES
parameter should index the probes or genes. Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES
factor is that for a single slide, and all slides must have a common layout.
The PLOT
option allows you to plot a dendrogram for the hierarchical cluster analyses, but for a large number of probes this is less useful as individual probes cannot be read. The responses of each probe across the targets/slides can also be plotted in a shade plot, but for large numbers of probes this is slow, in which case the mean response for each group can be plotted instead. A spreadsheet containing the grouped data can also be saved using the SPREADSHEET
option.
With large numbers of probes, the limit of RAM can be quickly reached, so option PERCENT
can be set so that only cluster probes with the largest mean absolute responses are clustered.
The DTITLE
and STITLE
options can supply titles for the dendrogram and shade plot, respectively, and the WINDOW
option specifies the window to use (by default 3). You can use the DEVICE
option to plot to a device other than the screen. The GRAPHICSFILE
option specifies then supplies a template for the file names.
Options: PRINT
, PLOT
, METHOD
, DMETHOD
, LMETHOD
, CRITERION
, PNGROUPS
, SNGROUPS
, PGTHRESHOLD
, SGTHRESHOLD
, PERCENT
, COLOURS
, DTITLE
, STITLE
, WINDOW
, DEVICE
, GRAPHICSFILE
, SPREADSHEET
.
Parameters: DATA
, SLIDES
, PROBES
, GMEANS
, PGROUPS
, SGROUPS
, PAMALGAMATIONS
, SAMALGAMATIONS
.
Action with RESTRICT
Any restrictions on the DATA
variates are removed.
See also
Procedures: DMADENSITY
, FDRBONFERRONI
, FDRMIXTURE
, MACALCULATE
, MAESTIMATE
, MAHISTOGRAM
, MAPCLUSTER
, MAPLOT
, MASCLUSTER
, MASHADE
, MAVOLCANO
, MNORMALIZE
.
Commands for: Microarray data.
Example
CAPTION 'MA2CLUSTER example'; STYLE=meta ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' IF check SPLOAD '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' MA2CLUSTER [PRINT=summary,cluster; PLOT=dendrogram,shade,meanshade;\ METHOD=hierarchical; LMETHOD=AverageLink; DMETHOD=Euclidean;\ PGTHRESHOLD=98; SGTHRESHOLD=99; PERCENT=1]\ DATA=cLogRatio; PROBES=Name; SLIDES=Slide ELSE CAPTION 'Microarray example datasets have not been installed.' ENDIF