Draws dendrograms with control over structure and style (P.G.N. Digby).
Options
STYLE = string token |
Style to use for the links of the dendrogram (average , centroid , lower , full ); default aver |
---|---|
ORDERING = string tokens |
How to define the order of the units for the dendrogram (given , ziggurat , size , first ); default zigg , size , firs |
REVERSE = string token |
Whether to reverse the order of the units in the dendrogram (no , yes ); default no |
ORIENTATION = string token |
Specifies the orientation of a dendrogram produced by high-resolution graphics (north , south , east , west ); default west |
METHOD = string token |
Method used to represent the scale on which the amalgamations have been made: settings other than the default are relevant only for data not generated by HCLUSTER or HDISPLAY (similarities , percentages , distances ); default simi |
SCREEN = string token |
Setting to use for the SCREEN option of DGRAPH (clear , keep ); default clea |
CHANGE = string token |
If a dendrogram-save structure from a previous DDENDROGRAM is used as the DATA parameter then this option specifies the area of the process where the first changes occur: see the description of the SAVE parameter (order , dendrogram , display ); default orde |
GRAPHICS = string token |
Form of graphics to be used (lineprinter , highresolution ); default high |
DSIMILARITY = string token |
Whether to display an axis for the similarities in high-resolution graphics (no , yes ); default no |
LOWSIMILARITY = scalar |
Lower value to be used for the axis showing the similarities; default * i.e. determined from the data |
NPAGES = scalar |
Number of pages to use for a high-resolution plot; default 1 |
PAGEINFORMATION = string tokens |
Controls what to include in a multi-page plot (similarity , title , pagenumber ); default simi , titl , page |
ENDACTION = string token |
Action to be taken after completing the plot (continue , pause ); default * uses the current setting |
Parameters
DATA = matrices or pointers |
Data defining each dendrogram in the form of either a matrix saved using the AMALGAMATIONS parameter of HCLUSTER (methods other than single linkage), or a matrix from the TREE parameter of HDISPLAY , or a SAVE structure from a previous use of DDENDROGRAM |
---|---|
PERMUTATION = variates |
Specify or save permutations of the units for drawing each dendrogram, according to ORDERING option |
LABELS = variates or texts |
Supply labels to use for the units of each dendrogram; these should be in the natural order of the units, not in a permuted order |
TITLE = texts |
Titles for the dendrograms |
WINDOW = scalars |
Window to use for each dendrogram (window 1 if unset); if this is set to zero the dendrogram is not drawn, but results can still be saved using the PERMUTATION , ZIGGURAT and SAVE parameters |
PENS = scalars, variates, strings or texts |
Scalar or string specifying the graphics pen or symbol in which to draw each (high-resolution or line-printer) dendrogram; alternatively use of a variate or text allows the structure of each dendrogram to be highlighted by drawing different links with different graphics pens or symbols |
ZIGGURAT = variates |
Save the “ziggurat-degree” of the links in each dendrogram |
SAVE = pointers |
Save the information required to plot a dendrogram, for use as input for the DATA parameter in a subsequent call to DDENDROGRAM |
Description
DDENDROGRAM
draws dendrograms using line-printer or high-resolution graphics, as indicated by the GRAPHICS
option. Dendrograms can be drawn in many ways, often with apparently quite different results, as illustrated by Digby (1985). Considerable control is allowed over the way in which the dendrogram is formed; in particular the order of the units and the style used for drawing the links of the dendrogram can be varied.
The information defining the dendrogram is given by the DATA
parameter. This should be a matrix containing the amalgamations information from hierarchical cluster analysis (from the AMALGAMATIONS
parameter of HCLUSTER
) or a matrix containing the minimum spanning tree information (from the TREE
parameter of the HDISPLAY
directive); alternatively a SAVE
structure from a previous DDENDROGRAM
can be used as input. However, the amalgamations matrix from HCLUSTER
is unusable if the clustering has been produced by single linkage, so the minimum spanning tree information, which is equivalent, should be used as input instead.
The PERMUTATION
parameter can be supplied with a variate, either to specify a permutation of the rows of the dendrogram or to save the permutation generated by DDENDROGRAM
, as indicated by the ORDERING
option. Setting ORDERING=given
takes the ordering defined by the PERMUTATION
variate. The other settings of ORDERING
define partial orderings of the units, and are used in conjunction with each other to obtain the full ordering: ziggurat
(Critchley 1983) is associated with ultrametric distances amongst the units; size
specifies that when 2 groups merge the smaller is always placed before the larger in the order; first
specifies that when 2 groups merge the group containing the lowest numbered unit is always placed before the other in the order. The orders given by settings ziggurat
and size
are not completely specified and recourse may be made to the other of these settings or to first
. If ORDERING
is not set to given
then a list of settings may be specified in which case the first in the list is used, the second is used to satisfy indeterminacies in the order given by the first setting in the list, and so on. The default is the list of settings: ziggurat
, size
, first
. The REVERSE
option allows the ordering thus obtained to be reversed.
The LABELS
parameter can be given a variate or a text to supply labels for the rows of the dendrogram. Labelling can be suppressed altogether by using a text containing only spaces.
The STYLE
option controls the style to use in forming the links of the dendrogram: its setting indicates where the line representing each new cluster should be placed. Assuming that the dendrogram has the units on the left-hand side, the settings can be described as follows: average
(the default) the new line is midway between the old lines; centroid
the new line is placed at the mid-point of all the units in the group it represents; lower
the new line is a continuation of the lower of the two old lines (comparable with dendrograms from HCLUSTER
); full
the new line is a continuation of the upper or lower of the two old lines, so that each vertical line spans all the units in the group it represents.
The ORIENTATION
option is relevant only to high-resolution graphics, when it controls the orientation of the dendrogram: for example the setting north
results in a “hanging dendrogram” with the units across the top. The default setting is west
, which gives a dendrogram with the units on the left-hand side; this is also how DDENDROGRAM
draws dendrograms on the line-printer.
The METHOD
option indicates the scale on which the amalgamations have been made. This option need be set only if the data have been obtained from a source other than HCLUSTER
or HDISPLAY
.
The TITLE
parameter specifies a title for each dendrogram. For high-resolution graphics, the WINDOW
parameter defines the graphics window to use for each plot. With line-printer graphics, two “windows” are available: window 1 has a width of 101 characters, window 2 a width of 61 characters. If WINDOW
is not set, window 1 is used. If it is set to zero, the dendrogram is not drawn but results can still be saved using the PERMUTATION
, ZIGGURAT
and SAVE
parameters; however, if the SAVE
structure is used later as input to DDENDROGRAM
, the CHANGE
option must not be set to display
as the dendrogram stage will not have been completed.
The LOWSIMILARITY
option allows the lower value of the axis showing the similarities (or percentage similarities or distances, according to the setting of the METHOD
option) to be set e.g. to zero. Otherwise, this is determined automatically from the minimum value in the data. By default the axis is not plotted, but this can be changed by setting option DSIMILARITY=yes
.
The NPAGES
option allows the display to be split over several pages in a high-resolution plot. The PAGEINFORMATION
option then controls what information is shown on the pages:
similarity |
includes the similarity axis on pages 2 onwards when DSIMILARITY=yes (otherwise it appears only on page 1), |
---|---|
title |
includes the TITLE on pages 2 onwards, and |
pagenumber |
includes page numbers. |
As in other graphics commands, the SCREEN
option controls whether to clear the high-resolution graphics screen before plotting (default clear
), and the ENDACTION
option controls whether Genstat pauses or continues after completing the plot.
For high-resolution graphics, the PENS
parameter can be set to a scalar to define the pen to use to draw the dendrogram. Alternatively, a variate can be specified to highlight the structure of the dendrogram by drawing different links with different pens; the links are taken in the same order as the rows of the AMALGAMATIONS
matrix from HCLUSTER
or in increasing order of the links of the minimum spanning tree. DDENDROGRAM
will use pen 1 if the PENS
parameter is not set. Any pens used by DDENDROGRAM
will be set to METHOD=line
, SYMBOLS=0
, JOIN=given
. If a scalar is supplied or PENS
is not set, the pen used will also have LINESTYLE
set to 1. If a variate is used, appropriate settings of COLOUR
and LINESTYLE
should set (using the PEN
directive) prior to calling DDENDROGRAM
. Similarly, with line-printer graphics, the PENS
parameter can be set either to a string or to a text, according to whether the links are to be drawn with the same or different symbols; if the parameter is unset, the plus symbol (+
) is used for all the links.
The ZIGGURAT
parameter can be used to save the “ziggurat-degree” (Critchley 1983) of each link. This could then be used to form the setting of the PENS
parameter for a later dendrogram, in order to display particular aspects of the clustering more clearly.
The SAVE
parameter can be used to save the various structures that control the drawing of a dendrogram in order to save computing time when drawing a similar dendrogram. The SAVE
structure should then be used as the setting of the DATA
parameter, and the CHANGE
option used to indicate the stage at which to start changing aspects of the previous dendrogram. The various stages (in order) involve the following options and parameters:
order |
ORDERING and PERMUTATION ; |
---|---|
dendrogram |
STYLE and METHOD ; |
display |
REVERSE , ORIENTATION , SCREEN , LABELS , TITLE , WINDOW , PENS , DSIMILARITY and LOWSIMILARITY . |
Options: STYLE
, ORDERING
, REVERSE
, ORIENTATION
, METHOD
, SCREEN
, CHANGE
, GRAPHICS
, DSIMILARITY
, LOWSIMILARITY
, NPAGES
, PAGEINFORMATION
, ENDACTION
.
Parameters: DATA
, PERMUTATION
, LABELS
, TITLE
, WINDOW
, PENS
, ZIGGURAT
, SAVE
.
Method
Dendrograms are constructed and drawn in four separate stages: firstly the amalgamations information is used to construct information on group sizes; secondly a permutation of the units is formed, if required, according to several possible ordering schemes; thirdly graphical information on each of the links of the dendrogram is formed; lastly this graphical information is used to display the dendrogram, subject to requirements over orientation, pens, etc. Separate procedures are used for each stage (for details see the source code of DDENDROGRAM
, obtainable via LIBEXAMPLE
). A preliminary stage is also needed to construct the amalgamations from information on a minimum spanning tree. Communication amongst the subsidiary procedures is obtained using a pointer, which the user may keep using the SAVE
parameter. The algorithms used by the first three subsidiary procedures are similar to those described by Digby (1984a, 1984b).
Action with RESTRICT
If any of the options or parameters are restricted unpredictable results may occur: none of the options or parameters should be restricted.
References
Critchley, F. (1983). Ziggurats and dendrograms. Report No. 43. Department of Statistics, University of Warwick.
Digby, P.G.N. (1984a). Drawing pretty dendrograms. Genstat Newsletter, 14, 18-26.
Digby, P.G.N. (1984b). Dendrograms and ziggurats. Genstat Newsletter, 14, 14-18.
Digby, P.G.N. (1985). Graphical displays for classification. PACT Journal of the European Study Group on Physical, Chemical and Mathematical Techniques Applied to Archaeology.
See also
Directives: HCLUSTER
, HDISPLAY
.
Procedure: DCLUSTERLABLES
.
Commands for: Multivariate and cluster analysis, Graphics.
Example
CAPTION 'DDENDROGRAM example',\ 'Data from the Guide to Genstat, Part 2, Section 6.1.2.';\ STYLE=meta,plain TEXT Cars; !T(Estate,'Arna1.5','Alfa2.5',Mondialqc,Testarossa,Croma,\ Panda,Regatta,Regattad,Uno,X19,Contach,Delta,Thema,Y10,Spider) POINTER Vars; !P(CC,NCyl,Tank,Wt,Length,Width,Ht,WBase,TSpeed,StSt,\ Carb,Drive) VARIATE [NVALUES=Cars] Vars[] READ [PRINT=*] Vars[] 1490 4 50 966 414 161 133 245 177 10.9 1 2 1409 4 50 845 399 162 139 242 174 10.2 1 2 2492 6 49 1160 433 163 140 251 210 8.2 1 1 3185 8 87 1430 458 179 126 265 249 7.4 2 1 4942 12 120 1506 449 198 113 255 291 5.8 2 1 1995 4 70 1180 450 176 143 266 209 7.8 2 2 965 4 35 761 338 149 146 216 134 16.8 1 2 1585 4 55 970 426 165 141 244 180 10.0 1 2 1714 4 55 980 426 165 141 245 150 18.9 3 2 999 4 42 720 364 155 143 236 145 16.2 1 2 1498 4 48 912 397 157 118 220 171 11.0 1 1 5167 12 120 1446 414 200 107 245 286 4.9 1 1 1585 4 45 1000 389 162 138 247 195 8.2 1 2 1995 4 70 1150 459 175 143 266 224 7.6 2 2 1049 4 47 790 339 151 143 216 179 11.8 1 2 1995 4 45 1050 414 162 125 228 190 9.0 2 1 : SYMMETRIC [ROWS=Cars] CarSim FSIMILARITY [SIMILARITY=CarSim]\ Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching) CAPTION !T('Average-linkage cluster analysis -',\ 'saving AMALGAMATIONS and PERMUTATION information') HCLUSTER [PRINT=dendrogram; METHOD=average] CarSim;\ AMALGAMATIONS=Am; PERMUTATION=Perm FRAME 1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1 DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0] DATA=Am;\ PERMUTATION=Perm; LABELS=Cars;\ TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep CAPTION !T('The AMALGAMATIONS matrix is shown below. The first',\ 'structure in DKeep is a matrix: its rows correspond to the',\ 'merges; its columns give merging information (with new node',\ 'numbers), group sizes, and ziggurat-degree.') PRINT [RLWIDTH=9; SERIAL=yes] Am,DKeep[1]; FIELDWIDTH=9; DECIMALS=3 " types of ordering " FRAME 5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\ XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2 DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes;\ SCREEN=clear; ENDACTION=continue; CHANGE=order] DATA=DKeep;\ TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\ SCREEN=keep; ENDACTION=continue; CHANGE=order] DATA=DKeep;\ TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6 DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\ SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram]\ DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7 DDENDROGRAM [STYLE=full; ORDER=ziggurat,size;\ SCREEN=keep; ENDACTION=pause; CHANGE=order] DATA=DKeep;\ PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size';\ WINDOW=8; ZIGGURAT=ZigDeg; SAVE=DSave