Forms and prints a group by levels table for each test together with appropriate summary statistics for each group.
Option
GROUPS = factor |
Factor defining the groups; no default i.e. this option must be specified |
---|
Parameters
DATA = variates or factors |
The data variables |
---|---|
TEST = string tokens |
Test type, defining how each variable is treated in the calculation of the similarity between each unit (simplematching , jaccard , russellrao , dice , antidice , sneathsokal , rogerstanimoto , cityblock , manhattan , ecological , euclidean , pythagorean , minkowski , divergence , canberra , braycurtis , soergel ); default * ignores that variable |
RANGE = scalars |
Range of possible values of each variable; if omitted, the observed range is taken |
Description
The HSUMMARIZE
directive helps you to see which clusters, if any, are distinguished by each variable. It requires a factor to define the clusters, as well as the original DATA
variables (variates or factors), together with their types and, optionally, their ranges. From this it prints a frequency table for each variable, classified by the grouping factor and the different values of the variable concerned.
The option and parameters of the HSUMMARIZE
directive are the same as those of the HLIST
directive, and are described there.
For qualitative variables (variates or factors with TEST
settings simplematching - rogerstanimoto
) the values are integral, and for each group Genstat calculates an interaction statistic labelled chi-square. This statistic does not have a significance level attached to it, but it does draw attention to groups for which the distribution is markedly different from the overall distribution.
For quantitative variables (i.e. variates with other settings) values are rounded to the nearest point on an 11-point scale (0-10). The interaction statistic is analogous to Student’s t, and it draws attention to the groups for which the mean value is markedly different from the overall mean (again with no significance level attached). Missing values are ignored in the computation of these statistics.
Option: GROUPS
.
Parameters: DATA
, TEST
, RANGE
.
Action with RESTRICT
You can restrict any of the DATA
variates or factors to do the calculations for only a subset of the units. If more than one of these is restricted, then they must all be restricted to the same set of units.
See also
Directives: HCLUSTER
, HDISPLAY
, HLIST
.
Commands for: Multivariate and cluster analysis.
Example
" Examples 2:6.19.1, 2:6.19.2a-d, 2:6.19.3a-b. 2:6.19.4-5, 2:6.19.7-8 " UNITS [NVALUES=16] VARIATE Engcc,Ncyl,Tankl,Weight,Length,Width,Height,Wbase,Tspeed,Stst,\ Carb,Drive,Vct[1...3] POINTER Cd; VALUES=!P(Engcc,Ncyl,Tankl,Weight,Length, \ Width,Height,Wbase,Tspeed,Stst) READ [PRINT=errors] #Cd,Carb,Drive 1490 4 50 966 414 161 133 245 177 10.9 1 2 1409 4 50 845 399 162 139 242 174 10.2 1 2 2492 6 49 1160 433 163 140 251 210 8.2 1 1 3185 8 87 1430 458 179 126 265 249 7.4 2 1 4942 12 120 1506 449 198 113 255 291 5.8 2 1 1995 4 70 1180 450 7176 143 266 209 7.8 2 2 965 4 35 761 338 149 146 216 134 16.8 1 2 1585 4 55 970 426 165 141 244 180 10.0 1 2 1714 4 55 980 426 165 141 245 150 18.9 3 2 999 4 42 720 364 155 143 236 145 16.2 1 2 1498 4 48 912 397 157 118 220 171 11.0 1 1 5167 12 120 1446 414 200 107 245 286 4.9 1 1 1585 4 45 1000 389 162 138 247 195 8.2 1 2 1995 4 70 1150 459 175 143 266 224 7.6 2 2 1049 4 47 790 339 151 143 216 179 11.8 1 2 1995 4 45 1050 414 162 125 228 190 9.0 2 1 : TEXT [VALUES=Estate,'Arna1.5','Alfa2.5',Mondialqc,\ Testarossa,Croma,Panda,Regatta,Regattad,Uno,\ X19,Contach,Delta,Thema,Y10,Spider] Carname FACTOR [NVALUES=Carname; LEVELS=16] Fcar; VALUES=!(1...16) SYMMETRICMATRIX [ROWS=Carname] Carsim " Form similarity matrix between cars." FSIMILARITY [SIMILARITY=Carsim; PRINT=*] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HCLUSTER [PRINT=dendrogram; METHOD=averagelink] Carsim; \ GTHRESHOLD=70; GROUPS=Cargrp; PERMUTATION=Carperm; \ AMALGAMATIONS=Caramalg FSIMILARITY [PRINT=similarities; SIMILARITY=Carsim; \ PERMUTATION=Carperm; STYLE=abbreviated] MATRIX [ROWS=Carname; COLUMNS=4] Carneig HDISPLAY [PRINT=neighbours] Carsim; NNEIGHBOURS=3; NEIGHBOURS=Carneig PRINT Carneig FACTOR [LABELS=!t(Fiat,'Alfa Romeo',Lancia,Ferrari,Lamborghini,\ Pinninfarina)] Maker; VALUES=!(2,2,2,4,4,1,1,1,1,1,1,5,3,3,3,6) HDISPLAY [PRINT=typical] Carsim; GROUPS=Maker HDISPLAY [PRINT=gsimilarities] Carsim; GROUPS=Maker; \ GSIMILARITY=Cargsim PRINT Cargsim HDISPLAY [PRINT=tree] Carsim; TREE=Cartree PRINT Cartree HLIST [UNITS=Carname] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HLIST [GROUPS=Maker; UNITS=Carname] #Cd,Carb,Drive; \ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) HSUMMARIZE [GROUPS=Cargrp] Weight,Carb; \ TEST=cityblock,simplematch TEXT Cars; VALUES=!T(Estate,'Arna1.5','Alfa2.5',Mondialqc,\ Testarossa,Croma,Panda,Regatta,Regattad,Uno,\ X19,Contach,Delta,Thema,Y10,Spider) FRAME 1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1 DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0; \ DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm; LABELS=Cars;\ TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep " types of ordering " FRAME 5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\ XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2 DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes; SCREEN=clear;\ ENDACTION=continue; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\ TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\ SCREEN=keep; ENDACTION=continue; CHANGE=order; DSIMILARITY=yes]\ DATA=DKeep; TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6 DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\ SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram; DSIMILARITY=yes]\ DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7 DDENDROGRAM [STYLE=full; ORDER=ziggurat,size; SCREEN=keep; \ ENDACTION=pause; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\ PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size'; WINDOW=8;\ ZIGGURAT=ZigDeg; SAVE=DSave HCLUSTER [PRINT=dendrogram; METHOD=singlelink] Carsim; \ GTHRESHOLD=90; GROUPS=Cargrpsing PRINT Cargrp,Cargrpsing HCOMPAREGROUPINGS [PRINT=indexes,tests; METHOD=arand,jaccard,rand]\ FIRSTGROUPING=Cargrp; SECONDGROUPING=Cargrpsing; SEED=93587 " obtain the clusters from the original cluster analysis " HFCLUSTERS Caramalg; CLUSTERS=Clusters " see often these clusters occur in 100 bootstrap samples of data variables " HBOOTSTRAP [PRINT=clusters; METHOD=averagelink; NTIMES=100; SEED=161647;\ CLUSTERS=Clusters; REPLICATION=Reps] #Cd,Carb,Drive;\ TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch) " replot the original dendrogram " DDENDROGRAM [STYLE=average; ORDERING=given; LOWSIMILARITY=0; \ DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm;\ LABELS=Cars; WINDOW=1 " plot the numbers of occurrence on the dendrogram " DCLUSTERLABELS [WINDOW=1] #Clusters; LABEL=#Reps