HSUMMARIZE directive

Forms and prints a group by levels table for each test together with appropriate summary statistics for each group.

Option

`GROUPS` = factor	Factor defining the groups; no default i.e. this option must be specified

Parameters

`DATA` = variates or factors	The data variables
`TEST` = string tokens	Test type, defining how each variable is treated in the calculation of the similarity between each unit (`simplematching`, `jaccard`, `russellrao`, `dice`, `antidice`, `sneathsokal`, `rogerstanimoto`, `cityblock`, `manhattan`, `ecological`, `euclidean`, `pythagorean`, `minkowski`, `divergence`, `canberra`, `braycurtis`, `soergel`); default `*` ignores that variable
`RANGE` = scalars	Range of possible values of each variable; if omitted, the observed range is taken

Description

The HSUMMARIZE directive helps you to see which clusters, if any, are distinguished by each variable. It requires a factor to define the clusters, as well as the original DATA variables (variates or factors), together with their types and, optionally, their ranges. From this it prints a frequency table for each variable, classified by the grouping factor and the different values of the variable concerned.

The option and parameters of the HSUMMARIZE directive are the same as those of the HLIST directive, and are described there.

For qualitative variables (variates or factors with TEST settings simplematching - rogerstanimoto) the values are integral, and for each group Genstat calculates an interaction statistic labelled chi-square. This statistic does not have a significance level attached to it, but it does draw attention to groups for which the distribution is markedly different from the overall distribution.

For quantitative variables (i.e. variates with other settings) values are rounded to the nearest point on an 11-point scale (0-10). The interaction statistic is analogous to Student’s t, and it draws attention to the groups for which the mean value is markedly different from the overall mean (again with no significance level attached). Missing values are ignored in the computation of these statistics.

Option: GROUPS.
Parameters: DATA, TEST, RANGE.

Action with `RESTRICT`

You can restrict any of the DATA variates or factors to do the calculations for only a subset of the units. If more than one of these is restricted, then they must all be restricted to the same set of units.

Example

" Examples 2:6.19.1, 2:6.19.2a-d, 2:6.19.3a-b. 2:6.19.4-5, 2:6.19.7-8 "
UNITS   [NVALUES=16]
VARIATE Engcc,Ncyl,Tankl,Weight,Length,Width,Height,Wbase,Tspeed,Stst,\
        Carb,Drive,Vct[1...3]
POINTER Cd; VALUES=!P(Engcc,Ncyl,Tankl,Weight,Length, \
        Width,Height,Wbase,Tspeed,Stst)
READ    [PRINT=errors] #Cd,Carb,Drive
  1490  4  50  966 414 161 133 245 177 10.9  1  2
  1409  4  50  845 399 162 139 242 174 10.2  1  2
  2492  6  49 1160 433 163 140 251 210  8.2  1  1
  3185  8  87 1430 458 179 126 265 249  7.4  2  1
  4942 12 120 1506 449 198 113 255 291  5.8  2  1
  1995  4  70 1180 450 7176 143 266 209  7.8  2  2
   965  4  35  761 338 149 146 216 134 16.8  1  2
  1585  4  55  970 426 165 141 244 180 10.0  1  2
  1714  4  55  980 426 165 141 245 150 18.9  3  2
   999  4  42  720 364 155 143 236 145 16.2  1  2
  1498  4  48  912 397 157 118 220 171 11.0  1  1
  5167 12 120 1446 414 200 107 245 286  4.9  1  1
  1585  4  45 1000 389 162 138 247 195  8.2  1  2
  1995  4  70 1150 459 175 143 266 224  7.6  2  2
  1049  4  47  790 339 151 143 216 179 11.8  1  2
  1995  4  45 1050 414 162 125 228 190  9.0  2  1 :
TEXT    [VALUES=Estate,'Arna1.5','Alfa2.5',Mondialqc,\
        Testarossa,Croma,Panda,Regatta,Regattad,Uno,\
        X19,Contach,Delta,Thema,Y10,Spider] Carname
FACTOR  [NVALUES=Carname; LEVELS=16] Fcar; VALUES=!(1...16)
SYMMETRICMATRIX [ROWS=Carname] Carsim
" Form similarity matrix between cars."
FSIMILARITY [SIMILARITY=Carsim; PRINT=*] #Cd,Carb,Drive; \
         TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch)
HCLUSTER [PRINT=dendrogram; METHOD=averagelink] Carsim; \
         GTHRESHOLD=70; GROUPS=Cargrp; PERMUTATION=Carperm; \
         AMALGAMATIONS=Caramalg
FSIMILARITY [PRINT=similarities; SIMILARITY=Carsim; \
            PERMUTATION=Carperm; STYLE=abbreviated]
MATRIX [ROWS=Carname; COLUMNS=4] Carneig
HDISPLAY [PRINT=neighbours] Carsim; NNEIGHBOURS=3; NEIGHBOURS=Carneig
PRINT Carneig
FACTOR [LABELS=!t(Fiat,'Alfa Romeo',Lancia,Ferrari,Lamborghini,\
  Pinninfarina)] Maker; VALUES=!(2,2,2,4,4,1,1,1,1,1,1,5,3,3,3,6)
HDISPLAY [PRINT=typical] Carsim; GROUPS=Maker
HDISPLAY [PRINT=gsimilarities] Carsim; GROUPS=Maker; \
  GSIMILARITY=Cargsim
PRINT Cargsim
HDISPLAY [PRINT=tree] Carsim; TREE=Cartree
PRINT Cartree
HLIST [UNITS=Carname] #Cd,Carb,Drive; \
  TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch)
HLIST [GROUPS=Maker; UNITS=Carname] #Cd,Carb,Drive; \
  TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch)
HSUMMARIZE [GROUPS=Cargrp] Weight,Carb; \
  TEST=cityblock,simplematch
TEXT Cars; VALUES=!T(Estate,'Arna1.5','Alfa2.5',Mondialqc,\
  Testarossa,Croma,Panda,Regatta,Regattad,Uno,\
  X19,Contach,Delta,Thema,Y10,Spider)
FRAME 1; YLOWER=0; YUPPER=1; XLOWER=0; XUPPER=1
DDENDROGRAM [STYLE=lower; ORDERING=given; LOWSIMILARITY=0; \
  DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm; LABELS=Cars;\
  TITLE='Dendrogram as from HCLUSTER'; SAVE=DKeep
" types of ordering "
FRAME 5...8; YLOWER=2(0.5,0.0); YUPPER=2(1.0,0.5);\
             XLOWER=(0.0,0.5)2; XUPPER=(0.5,1.0)2
DDENDROGRAM [STYLE=average; ORDERING=first; REVERSE=yes; SCREEN=clear;\
  ENDACTION=continue; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\
  TITLE='A: STYLE=average, ORDER=first'; WINDOW=5; SAVE=DSFrstAv
DDENDROGRAM [STYLE=centroid; ORDERING=size,ziggurat;\
  SCREEN=keep; ENDACTION=continue; CHANGE=order; DSIMILARITY=yes]\
  DATA=DKeep; TITLE='B: STYLE=centroid, ORDER=size,zig'; WINDOW=6
DDENDROGRAM [STYLE=lower; ORDERING=first; REVERSE=yes;\
  SCREEN=keep; ENDACTION=continue; CHANGE=dendrogram; DSIMILARITY=yes]\
  DATA=DSFrstAv; TITLE='C: STYLE=lower, ORDER=first'; WINDOW=7
DDENDROGRAM [STYLE=full; ORDER=ziggurat,size; SCREEN=keep; \
  ENDACTION=pause; CHANGE=order; DSIMILARITY=yes] DATA=DKeep;\
  PERMUTATION=PSave; TITLE='D: STYLE=full, ORDER=zig,size'; WINDOW=8;\
  ZIGGURAT=ZigDeg; SAVE=DSave
HCLUSTER [PRINT=dendrogram; METHOD=singlelink] Carsim; \
         GTHRESHOLD=90; GROUPS=Cargrpsing
PRINT    Cargrp,Cargrpsing
HCOMPAREGROUPINGS [PRINT=indexes,tests; METHOD=arand,jaccard,rand]\
         FIRSTGROUPING=Cargrp; SECONDGROUPING=Cargrpsing; SEED=93587
" obtain the clusters from the original cluster analysis "
HFCLUSTERS  Caramalg; CLUSTERS=Clusters
" see often these clusters occur in 100 bootstrap samples of data variables "
HBOOTSTRAP  [PRINT=clusters; METHOD=averagelink; NTIMES=100; SEED=161647;\
            CLUSTERS=Clusters; REPLICATION=Reps] #Cd,Carb,Drive;\
            TEST=4(cityblock),4(Euclidean),2(cityblock),2(simplematch)
" replot the original dendrogram "
DDENDROGRAM [STYLE=average; ORDERING=given; LOWSIMILARITY=0; \
            DSIMILARITY=yes] Caramalg; PERMUTATION=Carperm;\
            LABELS=Cars; WINDOW=1
" plot the numbers of occurrence on the dendrogram "
DCLUSTERLABELS [WINDOW=1] #Clusters; LABEL=#Reps

Updated on September 2, 2019

Was this article helpful?

Yes No