Draws dendrograms with control over structure and style (P.G.N. Digby).
||Style to use for the links of the dendrogram (
||How to define the order of the units for the dendrogram (
||Whether to reverse the order of the units in the dendrogram (
||Specifies the orientation of a dendrogram produced by high-resolution graphics (
||Method used to represent the scale on which the amalgamations have been made: settings other than the default are relevant only for data not generated by
||Setting to use for the
||If a dendrogram-save structure from a previous
||Form of graphics to be used (
||Whether to display an axis for the similarities in high-resolution graphics (
||Lower value to be used for the axis showing the similarities; default
||Number of pages to use for a high-resolution plot; default 1|
||Controls what to include in a multi-page plot (
||Action to be taken after completing the plot (
||Data defining each dendrogram in the form of either a matrix saved using the
||Specify or save permutations of the units for drawing each dendrogram, according to
||Supply labels to use for the units of each dendrogram; these should be in the natural order of the units, not in a permuted order|
||Titles for the dendrograms|
||Window to use for each dendrogram (window 1 if unset); if this is set to zero the dendrogram is not drawn, but results can still be saved using the
||Scalar or string specifying the graphics pen or symbol in which to draw each (high-resolution or line-printer) dendrogram; alternatively use of a variate or text allows the structure of each dendrogram to be highlighted by drawing different links with different graphics pens or symbols|
||Save the “ziggurat-degree” of the links in each dendrogram|
||Save the information required to plot a dendrogram, for use as input for the
DDENDROGRAM draws dendrograms using line-printer or high-resolution graphics, as indicated by the
GRAPHICS option. Dendrograms can be drawn in many ways, often with apparently quite different results, as illustrated by Digby (1985). Considerable control is allowed over the way in which the dendrogram is formed; in particular the order of the units and the style used for drawing the links of the dendrogram can be varied.
The information defining the dendrogram is given by the
DATA parameter. This should be a matrix containing the amalgamations information from hierarchical cluster analysis (from the
AMALGAMATIONS parameter of
HCLUSTER) or a matrix containing the minimum spanning tree information (from the
TREE parameter of the
HDISPLAY directive); alternatively a
SAVE structure from a previous
DDENDROGRAM can be used as input. However, the amalgamations matrix from
HCLUSTER is unusable if the clustering has been produced by single linkage, so the minimum spanning tree information, which is equivalent, should be used as input instead.
PERMUTATION parameter can be supplied with a variate, either to specify a permutation of the rows of the dendrogram or to save the permutation generated by
DDENDROGRAM, as indicated by the
ORDERING option. Setting
ORDERING=given takes the ordering defined by the
PERMUTATION variate. The other settings of
ORDERING define partial orderings of the units, and are used in conjunction with each other to obtain the full ordering:
ziggurat (Critchley 1983) is associated with ultrametric distances amongst the units;
size specifies that when 2 groups merge the smaller is always placed before the larger in the order;
first specifies that when 2 groups merge the group containing the lowest numbered unit is always placed before the other in the order. The orders given by settings
size are not completely specified and recourse may be made to the other of these settings or to
ORDERING is not set to
given then a list of settings may be specified in which case the first in the list is used, the second is used to satisfy indeterminacies in the order given by the first setting in the list, and so on. The default is the list of settings:
REVERSE option allows the ordering thus obtained to be reversed.
LABELS parameter can be given a variate or a text to supply labels for the rows of the dendrogram. Labelling can be suppressed altogether by using a text containing only spaces.
STYLE option controls the style to use in forming the links of the dendrogram: its setting indicates where the line representing each new cluster should be placed. Assuming that the dendrogram has the units on the left-hand side, the settings can be described as follows:
average (the default) the new line is midway between the old lines;
centroid the new line is placed at the mid-point of all the units in the group it represents;
lower the new line is a continuation of the lower of the two old lines (comparable with dendrograms from
full the new line is a continuation of the upper or lower of the two old lines, so that each vertical line spans all the units in the group it represents.
ORIENTATION option is relevant only to high-resolution graphics, when it controls the orientation of the dendrogram: for example the setting
north results in a “hanging dendrogram” with the units across the top. The default setting is
west, which gives a dendrogram with the units on the left-hand side; this is also how
DDENDROGRAM draws dendrograms on the line-printer.
TITLE parameter specifies a title for each dendrogram. For high-resolution graphics, the
WINDOW parameter defines the graphics window to use for each plot. With line-printer graphics, two “windows” are available: window 1 has a width of 101 characters, window 2 a width of 61 characters. If
WINDOW is not set, window 1 is used. If it is set to zero, the dendrogram is not drawn but results can still be saved using the
SAVE parameters; however, if the
SAVE structure is used later as input to
CHANGE option must not be set to
display as the dendrogram stage will not have been completed.
LOWSIMILARITY option allows the lower value of the axis showing the similarities (or percentage similarities or distances, according to the setting of the
METHOD option) to be set e.g. to zero. Otherwise, this is determined automatically from the minimum value in the data. By default the axis is not plotted, but this can be changed by setting option
NPAGES option allows the display to be split over several pages in a high-resolution plot. The
PAGEINFORMATION option then controls what information is shown on the pages:
||includes the similarity axis on pages 2 onwards when
||includes page numbers.|
As in other graphics commands, the
SCREEN option controls whether to clear the high-resolution graphics screen before plotting (default
clear), and the
ENDACTION option controls whether Genstat pauses or continues after completing the plot.
For high-resolution graphics, the
PENS parameter can be set to a scalar to define the pen to use to draw the dendrogram. Alternatively, a variate can be specified to highlight the structure of the dendrogram by drawing different links with different pens; the links are taken in the same order as the rows of the
AMALGAMATIONS matrix from
HCLUSTER or in increasing order of the links of the minimum spanning tree.
DDENDROGRAM will use pen 1 if the
PENS parameter is not set. Any pens used by
DDENDROGRAM will be set to
JOIN=given. If a scalar is supplied or
PENS is not set, the pen used will also have
LINESTYLE set to 1. If a variate is used, appropriate settings of
LINESTYLE should set (using the
PEN directive) prior to calling
DDENDROGRAM. Similarly, with line-printer graphics, the
PENS parameter can be set either to a string or to a text, according to whether the links are to be drawn with the same or different symbols; if the parameter is unset, the plus symbol (
+) is used for all the links.
ZIGGURAT parameter can be used to save the “ziggurat-degree” (Critchley 1983) of each link. This could then be used to form the setting of the
PENS parameter for a later dendrogram, in order to display particular aspects of the clustering more clearly.
SAVE parameter can be used to save the various structures that control the drawing of a dendrogram in order to save computing time when drawing a similar dendrogram. The
SAVE structure should then be used as the setting of the
DATA parameter, and the
CHANGE option used to indicate the stage at which to start changing aspects of the previous dendrogram. The various stages (in order) involve the following options and parameters:
Dendrograms are constructed and drawn in four separate stages: firstly the amalgamations information is used to construct information on group sizes; secondly a permutation of the units is formed, if required, according to several possible ordering schemes; thirdly graphical information on each of the links of the dendrogram is formed; lastly this graphical information is used to display the dendrogram, subject to requirements over orientation, pens, etc. Separate procedures are used for each stage (for details see the source code of
DDENDROGRAM, obtainable via
LIBEXAMPLE). A preliminary stage is also needed to construct the amalgamations from information on a minimum spanning tree. Communication amongst the subsidiary procedures is obtained using a pointer, which the user may keep using the
SAVE parameter. The algorithms used by the first three subsidiary procedures are similar to those described by Digby (1984a, 1984b).
If any of the options or parameters are restricted unpredictable results may occur: none of the options or parameters should be restricted.
Critchley, F. (1983). Ziggurats and dendrograms. Report No. 43. Department of Statistics, University of Warwick.
Digby, P.G.N. (1984a). Drawing pretty dendrograms. Genstat Newsletter, 14, 18-26.
Digby, P.G.N. (1984b). Dendrograms and ziggurats. Genstat Newsletter, 14, 14-18.
Digby, P.G.N. (1985). Graphical displays for classification. PACT Journal of the European Study Group on Physical, Chemical and Mathematical Techniques Applied to Archaeology.
CAPTION 'DDENDROGRAM example',\ 'Data from the Guide to Genstat, Part 2, Section 6.1.2.';\ STYLE=meta,plain TEXT Cars; !T(Estate,'Arna1.5','Alfa2.5',Mondialqc,Testarossa,Croma,\ Panda,Regatta,Regattad,Uno,X19,Contach,Delta,Thema,Y10,Spider) POINTER Vars; !P(CC,NCyl,Tank,Wt,Length,Width,Ht,WBase,TSpeed,StSt,\ Carb,Drive) VARIATE [NVALUES=Cars] Vars READ [PRINT=*] Vars 1490 4 50 966 414 161 133 245 177 10.9 1 2 1409 4 50 845 399 162 139 242 174 10.2 1 2 2492 6 49 1160 433 163 140 251 210 8.2 1 1 3185 8 87 1430 458 179 126 265 249 7.4 2 1 4942 12 120 1506 449 198 113 255 291 5.8 2 1 1995 4 70 1180 450 176 143 266 209 7.8 2 2 965 4 35 761 338 149 146 216 134 16.8 1 2 1585 4 55 970 426 165 141 244 180 10.0 1 2 1714 4 55 980 426 165 141 245 150 18.9 3 2 999 4 42 720 364 155 143 236 145 16.2 1 2 1498 4 48 912 397 157 118 220 171 11.0 1 1 5167 12 120 1446 414 200 107 245 286 4.9 1 1 1585 4 45 1000 389 162 138 247 195 8.2 1 2 1995 4 70 1150 459 175 143 266 224 7.6 2 2 1049 4 47 790 339 151 143 216 179 11.8 1 2 1995 4 45 1050 414 162 125 228 190 9.0 2 1 : SYMMETRIC [ROWS=Cars] CarSim FSIMILARITY [SIMILARITY=CarSim]\ Vars; TEST=4(cityblock,euclidean),2(cityblock,simplematching) CAPTION !T('Average-linkage cluster analysis -',\ 'saving AMALGAMATIONS and PERMUTATION information') HCLUSTER [PRINT=dendrogram; METHOD=average] CarSim;\ AMALGAMATIONS=Am; PERMUTATION=Perm FRAME 1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1 DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0] DATA=Am;\ PERMUTATION=Perm; LABELS=Cars;\ TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep CAPTION !T('The AMALGAMATIONS matrix is shown below. The first',\ 'structure in DKeep is a matrix: its rows correspond to the',\ 'merges; its columns give merging information (with new node',\ 'numbers), group sizes, and ziggurat-degree.') PRINT [RLWIDTH=9; SERIAL=yes] Am,DKeep; FIELDWIDTH=9; DECIMALS=3 " types of ordering " FRAME 5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\ XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2 DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes;\ SCREEN=clear; ENDACTION=continue; CHANGE=order] DATA=DKeep;\ TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\ SCREEN=keep; ENDACTION=continue; CHANGE=order] DATA=DKeep;\ TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6 DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\ SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram]\ DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7 DDENDROGRAM [STYLE=full; ORDER=ziggurat,size;\ SCREEN=keep; ENDACTION=pause; CHANGE=order] DATA=DKeep;\ PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size';\ WINDOW=8; ZIGGURAT=ZigDeg; SAVE=DSave