1. Home
  2. DDENDROGRAM procedure

DDENDROGRAM procedure

Draws dendrograms with control over structure and style (P.G.N. Digby).

Options

STYLE = string token Style to use for the links of the dendrogram (average, centroid, lower, full); default aver
ORDERING = string tokens How to define the order of the units for the dendrogram (given, ziggurat, size, first); default zigg, size, firs
REVERSE = string token Whether to reverse the order of the units in the dendrogram (no, yes); default no
ORIENTATION = string token Specifies the orientation of a dendrogram produced by high-resolution graphics (north, south, east, west); default west
METHOD = string token Method used to represent the scale on which the amalgamations have been made: settings other than the default are relevant only for data not generated by HCLUSTER or HDISPLAY (similarities, percentages, distances); default simi
SCREEN = string token Setting to use for the SCREEN option of DGRAPH (clear, keep); default clea
CHANGE = string token If a dendrogram-save structure from a previous DDENDROGRAM is used as the DATA parameter then this option specifies the area of the process where the first changes occur: see the description of the SAVE parameter (order, dendrogram, display); default orde
GRAPHICS = string token Form of graphics to be used (lineprinter, highresolution); default high
DSIMILARITY = string token Whether to display an axis for the similarities in high-resolution graphics (no, yes); default no
LOWSIMILARITY = scalar Lower value to be used for the axis showing the similarities; default * i.e. determined from the data
NPAGES = scalar Number of pages to use for a high-resolution plot; default 1
PAGEINFORMATION = string tokens Controls what to include in a multi-page plot (similarity, title, pagenumber); default simi, titl, page
ENDACTION = string token Action to be taken after completing the plot (continue, pause); default * uses the current setting

Parameters

DATA = matrices or pointers Data defining each dendrogram in the form of either a matrix saved using the AMALGAMATIONS parameter of HCLUSTER (methods other than single linkage), or a matrix from the TREE parameter of HDISPLAY, or a SAVE structure from a previous use of DDENDROGRAM
PERMUTATION = variates Specify or save permutations of the units for drawing each dendrogram, according to ORDERING option
LABELS = variates or texts Supply labels to use for the units of each dendrogram; these should be in the natural order of the units, not in a permuted order
TITLE = texts Titles for the dendrograms
WINDOW = scalars Window to use for each dendrogram (window 1 if unset); if this is set to zero the dendrogram is not drawn, but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parameters
PENS = scalars, variates, strings or texts Scalar or string specifying the graphics pen or symbol in which to draw each (high-resolution or line-printer) dendrogram; alternatively use of a variate or text allows the structure of each dendrogram to be highlighted by drawing different links with different graphics pens or symbols
ZIGGURAT = variates Save the “ziggurat-degree” of the links in each dendrogram
SAVE = pointers Save the information required to plot a dendrogram, for use as input for the DATA parameter in a subsequent call to DDENDROGRAM

Description

DDENDROGRAM draws dendrograms using line-printer or high-resolution graphics, as indicated by the GRAPHICS option. Dendrograms can be drawn in many ways, often with apparently quite different results, as illustrated by Digby (1985). Considerable control is allowed over the way in which the dendrogram is formed; in particular the order of the units and the style used for drawing the links of the dendrogram can be varied.

The information defining the dendrogram is given by the DATA parameter. This should be a matrix containing the amalgamations information from hierarchical cluster analysis (from the AMALGAMATIONS parameter of HCLUSTER) or a matrix containing the minimum spanning tree information (from the TREE parameter of the HDISPLAY directive); alternatively a SAVE structure from a previous DDENDROGRAM can be used as input. However, the amalgamations matrix from HCLUSTER is unusable if the clustering has been produced by single linkage, so the minimum spanning tree information, which is equivalent, should be used as input instead.

The PERMUTATION parameter can be supplied with a variate, either to specify a permutation of the rows of the dendrogram or to save the permutation generated by DDENDROGRAM, as indicated by the ORDERING option. Setting ORDERING=given takes the ordering defined by the PERMUTATION variate. The other settings of ORDERING define partial orderings of the units, and are used in conjunction with each other to obtain the full ordering: ziggurat (Critchley 1983) is associated with ultrametric distances amongst the units; size specifies that when 2 groups merge the smaller is always placed before the larger in the order; first specifies that when 2 groups merge the group containing the lowest numbered unit is always placed before the other in the order. The orders given by settings ziggurat and size are not completely specified and recourse may be made to the other of these settings or to first. If ORDERING is not set to given then a list of settings may be specified in which case the first in the list is used, the second is used to satisfy indeterminacies in the order given by the first setting in the list, and so on. The default is the list of settings: ziggurat, size, first. The REVERSE option allows the ordering thus obtained to be reversed.

The LABELS parameter can be given a variate or a text to supply labels for the rows of the dendrogram. Labelling can be suppressed altogether by using a text containing only spaces.

The STYLE option controls the style to use in forming the links of the dendrogram: its setting indicates where the line representing each new cluster should be placed. Assuming that the dendrogram has the units on the left-hand side, the settings can be described as follows: average (the default) the new line is midway between the old lines; centroid the new line is placed at the mid-point of all the units in the group it represents; lower the new line is a continuation of the lower of the two old lines (comparable with dendrograms from HCLUSTER); full the new line is a continuation of the upper or lower of the two old lines, so that each vertical line spans all the units in the group it represents.

The ORIENTATION option is relevant only to high-resolution graphics, when it controls the orientation of the dendrogram: for example the setting north results in a “hanging dendrogram” with the units across the top. The default setting is west, which gives a dendrogram with the units on the left-hand side; this is also how DDENDROGRAM draws dendrograms on the line-printer.

The METHOD option indicates the scale on which the amalgamations have been made. This option need be set only if the data have been obtained from a source other than HCLUSTER or HDISPLAY.

The TITLE parameter specifies a title for each dendrogram. For high-resolution graphics, the WINDOW parameter defines the graphics window to use for each plot. With line-printer graphics, two “windows” are available: window 1 has a width of 101 characters, window 2 a width of 61 characters. If WINDOW is not set, window 1 is used. If it is set to zero, the dendrogram is not drawn but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parameters; however, if the SAVE structure is used later as input to DDENDROGRAM, the CHANGE option must not be set to display as the dendrogram stage will not have been completed.

The LOWSIMILARITY option allows the lower value of the axis showing the similarities (or percentage similarities or distances, according to the setting of the METHOD option) to be set e.g. to zero. Otherwise, this is determined automatically from the minimum value in the data. By default the axis is not plotted, but this can be changed by setting option DSIMILARITY=yes.

The NPAGES option allows the display to be split over several pages in a high-resolution plot. The PAGEINFORMATION option then controls what information is shown on the pages:

    similarity includes the similarity axis on pages 2 onwards when DSIMILARITY=yes (otherwise it appears only on page 1),
    title includes the TITLE on pages 2 onwards, and
    pagenumber includes page numbers.

As in other graphics commands, the SCREEN option controls whether to clear the high-resolution graphics screen before plotting (default clear), and the ENDACTION option controls whether Genstat pauses or continues after completing the plot.

For high-resolution graphics, the PENS parameter can be set to a scalar to define the pen to use to draw the dendrogram. Alternatively, a variate can be specified to highlight the structure of the dendrogram by drawing different links with different pens; the links are taken in the same order as the rows of the AMALGAMATIONS matrix from HCLUSTER or in increasing order of the links of the minimum spanning tree. DDENDROGRAM will use pen 1 if the PENS parameter is not set. Any pens used by DDENDROGRAM will be set to METHOD=line, SYMBOLS=0, JOIN=given. If a scalar is supplied or PENS is not set, the pen used will also have LINESTYLE set to 1. If a variate is used, appropriate settings of COLOUR and LINESTYLE should set (using the PEN directive) prior to calling DDENDROGRAM. Similarly, with line-printer graphics, the PENS parameter can be set either to a string or to a text, according to whether the links are to be drawn with the same or different symbols; if the parameter is unset, the plus symbol (+) is used for all the links.

The ZIGGURAT parameter can be used to save the “ziggurat-degree” (Critchley 1983) of each link. This could then be used to form the setting of the PENS parameter for a later dendrogram, in order to display particular aspects of the clustering more clearly.

The SAVE parameter can be used to save the various structures that control the drawing of a dendrogram in order to save computing time when drawing a similar dendrogram. The SAVE structure should then be used as the setting of the DATA parameter, and the CHANGE option used to indicate the stage at which to start changing aspects of the previous dendrogram. The various stages (in order) involve the following options and parameters:

    order ORDERING and PERMUTATION;
    dendrogram STYLE and METHOD;
    display REVERSE, ORIENTATION, SCREEN, LABELS, TITLE, WINDOW, PENS, DSIMILARITY and LOWSIMILARITY.

Options: STYLE, ORDERING, REVERSE, ORIENTATION, METHOD, SCREEN, CHANGE, GRAPHICS, DSIMILARITY, LOWSIMILARITY, NPAGES, PAGEINFORMATION, ENDACTION.

Parameters: DATA, PERMUTATION, LABELS, TITLE, WINDOW, PENS, ZIGGURAT, SAVE.

Method

Dendrograms are constructed and drawn in four separate stages: firstly the amalgamations information is used to construct information on group sizes; secondly a permutation of the units is formed, if required, according to several possible ordering schemes; thirdly graphical information on each of the links of the dendrogram is formed; lastly this graphical information is used to display the dendrogram, subject to requirements over orientation, pens, etc. Separate procedures are used for each stage (for details see the source code of DDENDROGRAM, obtainable via LIBEXAMPLE). A preliminary stage is also needed to construct the amalgamations from information on a minimum spanning tree. Communication amongst the subsidiary procedures is obtained using a pointer, which the user may keep using the SAVE parameter. The algorithms used by the first three subsidiary procedures are similar to those described by Digby (1984a, 1984b).

Action with RESTRICT

If any of the options or parameters are restricted unpredictable results may occur: none of the options or parameters should be restricted.

References

Critchley, F. (1983). Ziggurats and dendrograms. Report No. 43. Department of Statistics, University of Warwick.

Digby, P.G.N. (1984a). Drawing pretty dendrograms. Genstat Newsletter, 14, 18-26.

Digby, P.G.N. (1984b). Dendrograms and ziggurats. Genstat Newsletter, 14, 14-18.

Digby, P.G.N. (1985). Graphical displays for classification. PACT Journal of the European Study Group on Physical, Chemical and Mathematical Techniques Applied to Archaeology.

See also

Directives: HCLUSTER, HDISPLAY.
Procedure: DCLUSTERLABLES.
Commands for: Multivariate and cluster analysis, Graphics.

Example

CAPTION     'DDENDROGRAM example',\
            'Data from the Guide to Genstat, Part 2, Section 6.1.2.';\
            STYLE=meta,plain
TEXT        Cars; !T(Estate,'Arna1.5','Alfa2.5',Mondialqc,Testarossa,Croma,\
            Panda,Regatta,Regattad,Uno,X19,Contach,Delta,Thema,Y10,Spider)
POINTER     Vars; !P(CC,NCyl,Tank,Wt,Length,Width,Ht,WBase,TSpeed,StSt,\
            Carb,Drive)
VARIATE     [NVALUES=Cars] Vars[]
READ        [PRINT=*] Vars[]
 1490  4  50  966 414 161 133 245 177 10.9  1  2
 1409  4  50  845 399 162 139 242 174 10.2  1  2
 2492  6  49 1160 433 163 140 251 210  8.2  1  1
 3185  8  87 1430 458 179 126 265 249  7.4  2  1
 4942 12 120 1506 449 198 113 255 291  5.8  2  1
 1995  4  70 1180 450 176 143 266 209  7.8  2  2
  965  4  35  761 338 149 146 216 134 16.8  1  2
 1585  4  55  970 426 165 141 244 180 10.0  1  2
 1714  4  55  980 426 165 141 245 150 18.9  3  2
  999  4  42  720 364 155 143 236 145 16.2  1  2
 1498  4  48  912 397 157 118 220 171 11.0  1  1
 5167 12 120 1446 414 200 107 245 286  4.9  1  1
 1585  4  45 1000 389 162 138 247 195  8.2  1  2
 1995  4  70 1150 459 175 143 266 224  7.6  2  2
 1049  4  47  790 339 151 143 216 179 11.8  1  2
 1995  4  45 1050 414 162 125 228 190  9.0  2  1 :
SYMMETRIC   [ROWS=Cars] CarSim
FSIMILARITY [SIMILARITY=CarSim]\
            Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching)
CAPTION     !T('Average-linkage cluster analysis -',\
            'saving AMALGAMATIONS and PERMUTATION information')
HCLUSTER    [PRINT=dendrogram; METHOD=average] CarSim;\
            AMALGAMATIONS=Am; PERMUTATION=Perm

FRAME       1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1
DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0] DATA=Am;\
            PERMUTATION=Perm; LABELS=Cars;\
            TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep
CAPTION     !T('The AMALGAMATIONS matrix is shown below. The first',\
            'structure in DKeep is a matrix: its rows correspond to the',\
            'merges; its columns give merging information (with new node',\
            'numbers), group sizes, and ziggurat-degree.')
PRINT       [RLWIDTH=9; SERIAL=yes] Am,DKeep[1]; FIELDWIDTH=9; DECIMALS=3

" types of ordering "
FRAME       5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\
            XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2
DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes;\
            SCREEN=clear; ENDACTION=continue; CHANGE=order] DATA=DKeep;\
            TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv
DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\
            SCREEN=keep; ENDACTION=continue; CHANGE=order] DATA=DKeep;\
            TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6
DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\
            SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram]\
            DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7
DDENDROGRAM [STYLE=full; ORDER=ziggurat,size;\
            SCREEN=keep; ENDACTION=pause; CHANGE=order] DATA=DKeep;\
            PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size';\
            WINDOW=8; ZIGGURAT=ZigDeg; SAVE=DSave
Updated on September 2, 2019

Was this article helpful?