Displays results ancillary to hierarchical cluster analyses: matrix of mean similarities between and within groups, a set of nearest neighbours for each unit, a minimum spanning tree, and the most typical elements from each group.
|Printed output required (
||Input similarity matrix for each cluster analysis|
||Number of nearest neighbours to be printed|
||Matrix to store nearest neighbours of each unit|
||Indicates the groupings of the units (for calculating typical elements and mean similarities between groups)|
||To store the minimum spanning tree (as a series of links and corresponding lengths)|
||To store similarities between groups|
You can use the
HDISPLAY directive to print ancillary information useful for interpreting cluster analyses, and to save information to use elsewhere in Genstat, for example for plotting.
SIMILARITIES parameter specifies a list of symmetric similarity matrices. These are operated on, in turn, to produce the output requested by the
NNEIGHBOURS parameter gives a list of scalars indicating how many neighbours will appear in the printed table of nearest neighbours.
NEIGHBOURS parameter can specify a list of identifiers to store details of nearest neighbours. These will be declared implicitly, if necessary, as matrices. The rows of the matrices correspond to the units; there should be an even number of columns. The values in the odd-numbered columns represent the neighbouring units in order of their similarity, while the values in the even-numbered columns are the corresponding similarities. If you have declared the matrix previously and it does not have enough columns, then
NEIGHBOURS stores as many neighbours as possible. If there is an odd number of columns in the matrix, the last column is not filled. If the matrix is declared implicitly, the number of columns will be twice the value of the
neighbours, Genstat prints a table of nearest neighbours for every sample, together with their values of similarity. The number of neighbours printed is determined by the value of the
NNEIGHBOURS scalar; if
NNEIGHBOURS is not set, the table is not printed. This information is also useful for interpreting clusters and ordinations.
GROUPS parameter specifies a factor to divide the units of each similarity matrix into clusters. You may have formed the factor from a previous hierarchical cluster analysis, using
HCLUSTER. This parameter must be set if the
typicalelement, Genstat prints the average similarity of each group member with the other group members. This is to help you identify typical members of each group: typical members will have relatively large average similarities compared to those of the other members. Within each group, members are printed in decreasing order of average similarity.
GSIMILARITY parameter specifies a list of symmetric matrices in which you can save the mean between-group and within-group similarities. Any structure that you have not declared already will be declared implicitly to be a symmetric matrix with number of rows equal to the number of levels of the factor in the
gsimilarities, Genstat prints the mean similarities between-groups and within-groups. Self-similarities are excluded.
TREE parameter can specify a matrix to save the minimum spanning tree. The matrix is set up with two columns and number of rows equal to the number of units. For each unit, the value in the first column is the unit to which that unit is linked on its left; the second column is the corresponding similarity. The first unit is not linked to any unit on its left, as it is always the first unit on the tree; so the first row of the matrix contains missing values. The
HFAMALGAMATIONS procedure can use the tree to form an amalgamations matrix, representing how the clusters would be formed with this similarity matrix by single-linkage cluster analysis.
tree prints the minimum spanning tree associated with the similarity matrix specified the
SIMILARITY parameter. The minimum spanning tree (MST) is not a Genstat structure, but it can be kept in the form described above: that is, in a matrix with two columns. An MST is a tree connecting the n points of a multidimensional representation of the sampling units. In a tree every unit is linked to a connected network and there are no closed loops; the special feature of the MST is that, of all trees with a sampling unit at every node, it is the one whose links have minimum total length. The links include all those that join nearest neighbours; the MST is closely related to single linkage hierarchical trees. Minimum spanning trees are also useful if you superimpose them on ordinations to reveal regions in which distance is badly distorted (see procedure
DMST); if neighbouring points, as given by the MST, are distant in the ordination then something is badly wrong.
" Examples 2:6.19.1, 2:6.19.2a-d, 2:6.19.3a-b. 2:6.19.4-5, 2:6.19.7-8 " UNITS [NVALUES=16] VARIATE Engcc,Ncyl,Tankl,Weight,Length,Width,Height,Wbase,Tspeed,Stst,\ Carb,Drive,Vct[1...3] POINTER Cd; VALUES=!P(Engcc,Ncyl,Tankl,Weight,Length, \ Width,Height,Wbase,Tspeed,Stst) READ [PRINT=errors] #Cd,Carb,Drive 1490 4 50 966 414 161 133 245 177 10.9 1 2 1409 4 50 845 399 162 139 242 174 10.2 1 2 2492 6 49 1160 433 163 140 251 210 8.2 1 1 3185 8 87 1430 458 179 126 265 249 7.4 2 1 4942 12 120 1506 449 198 113 255 291 5.8 2 1 1995 4 70 1180 450 7176 143 266 209 7.8 2 2 965 4 35 761 338 149 146 216 134 16.8 1 2 1585 4 55 970 426 165 141 244 180 10.0 1 2 1714 4 55 980 426 165 141 245 150 18.9 3 2 999 4 42 720 364 155 143 236 145 16.2 1 2 1498 4 48 912 397 157 118 220 171 11.0 1 1 5167 12 120 1446 414 200 107 245 286 4.9 1 1 1585 4 45 1000 389 162 138 247 195 8.2 1 2 1995 4 70 1150 459 175 143 266 224 7.6 2 2 1049 4 47 790 339 151 143 216 179 11.8 1 2 1995 4 45 1050 414 162 125 228 190 9.0 2 1 : TEXT [VALUES=Estate,'Arna1.5','Alfa2.5',Mondialqc,\ Testarossa,Croma,Panda,Regatta,Regattad,Uno,\ X19,Contach,Delta,Thema,Y10,Spider] Carname FACTOR [NVALUES=Carname; LEVELS=16] Fcar; VALUES=!(1...16) SYMMETRICMATRIX [ROWS=Carname] Carsim " Form similarity matrix between cars." FSIMILARITY [SIMILARITY=Carsim; PRINT=*] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HCLUSTER [PRINT=dendrogram; METHOD=averagelink] Carsim; \ GTHRESHOLD=70; GROUPS=Cargrp; PERMUTATION=Carperm; \ AMALGAMATIONS=Caramalg FSIMILARITY [PRINT=similarities; SIMILARITY=Carsim; \ PERMUTATION=Carperm; STYLE=abbreviated] MATRIX [ROWS=Carname; COLUMNS=4] Carneig HDISPLAY [PRINT=neighbours] Carsim; NNEIGHBOURS=3; NEIGHBOURS=Carneig PRINT Carneig FACTOR [LABELS=!t(Fiat,'Alfa Romeo',Lancia,Ferrari,Lamborghini,\ Pinninfarina)] Maker; VALUES=!(2,2,2,4,4,1,1,1,1,1,1,5,3,3,3,6) HDISPLAY [PRINT=typical] Carsim; GROUPS=Maker HDISPLAY [PRINT=gsimilarities] Carsim; GROUPS=Maker; \ GSIMILARITY=Cargsim PRINT Cargsim HDISPLAY [PRINT=tree] Carsim; TREE=Cartree PRINT Cartree HLIST [UNITS=Carname] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HLIST [GROUPS=Maker; UNITS=Carname] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HSUMMARIZE [GROUPS=Cargrp] Weight,Carb; \ TEST=cityblock,simplematch TEXT Cars; VALUES=!T(Estate,'Arna1.5','Alfa2.5',Mondialqc,\ Testarossa,Croma,Panda,Regatta,Regattad,Uno,\ X19,Contach,Delta,Thema,Y10,Spider) FRAME 1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1 DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0; \ DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm; LABELS=Cars;\ TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep " types of ordering " FRAME 5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\ XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2 DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes; SCREEN=clear;\ ENDACTION=continue; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\ TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\ SCREEN=keep; ENDACTION=continue; CHANGE=order; DSIMILARITY=yes]\ DATA=DKeep; TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6 DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\ SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram; DSIMILARITY=yes]\ DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7 DDENDROGRAM [STYLE=full; ORDER=ziggurat,size; SCREEN=keep; \ ENDACTION=pause; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\ PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size'; WINDOW=8;\ ZIGGURAT=ZigDeg; SAVE=DSave HCLUSTER [PRINT=dendrogram; METHOD=singlelink] Carsim; \ GTHRESHOLD=90; GROUPS=Cargrpsing PRINT Cargrp,Cargrpsing HCOMPAREGROUPINGS [PRINT=indexes,tests; METHOD=arand,jaccard,rand]\ FIRSTGROUPING=Cargrp; SECONDGROUPING=Cargrpsing; SEED=93587 " obtain the clusters from the original cluster analysis " HFCLUSTERS Caramalg; CLUSTERS=Clusters " see often these clusters occur in 100 bootstrap samples of data variables " HBOOTSTRAP [PRINT=clusters; METHOD=averagelink; NTIMES=100; SEED=161647;\ CLUSTERS=Clusters; REPLICATION=Reps] #Cd,Carb,Drive;\ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) " replot the original dendrogram " DDENDROGRAM [STYLE=average; ORDERING=given; LOWSIMILARITY=0; \ DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm;\ LABELS=Cars; WINDOW=1 " plot the numbers of occurrence on the dendrogram " DCLUSTERLABELS [WINDOW=1] #Clusters; LABEL=#Reps