HLIST directive

Lists the data matrix in abbreviated form.

Options

`GROUPS` = factor	Defines groupings of the units; used to split the printed table at appropriate places and to label the groups; default `*`
`UNITS` = text or variate	Names for the rows (i.e. units) of the table; default `*`

Parameters

`DATA` = variates or factors	The data variables
`TEST` = string tokens	Test type, defining how each variable is treated in the calculation of the similarity between each unit (`simplematching`, `jaccard`, `russellrao`, `dice`, `antidice`, `sneathsokal`, `rogerstanimoto`, `cityblock`, `manhattan`, `ecological`, `euclidean`, `pythagorean`, `minkowski`, `divergence`, `canberra`, `braycurtis`, `soergel`); default `*` ignores that variable
`RANGE` = scalars	Range of possible values of each variable; if omitted, the observed range is taken

Description

HLIST lists the values of the data matrix in a condensed form, either in their original order or, more usefully, in the order determined by a cluster analysis (see HCLUSTER). This representation can be very helpful for revealing patterns in the data, associated with clusters, or for an initial scan of the data to pick out interesting features of the variables.

The DATA parameter specifies a list of variates or factors, all of which must be of the same length. The TEST parameter specifies a list of strings, one for each variate or factor in the DATA parameter list, to define the “type” of each one. This is similar to the TEST parameter used in FSIMILARITY to determine how differences in variate or factor values for each unit contribute to the overall similarity between units. However, HLIST distinguishes only between qualitative variables (factors or variates with settings simplematching - rogerstanimoto) and quantitative variables (variates with other settings). The values of qualitative variates are printed directly. If the range of a quantitative variate is greater than 10, the printed values are scaled to lie in the range 0 to 10. This scaling is done by subtracting the minimum value, dividing by the range and then multiplying by 10. If the range is less than 10, the values are printed unscaled; so quantitative variates with values that are all less than 1 will appear as 0 in the abbreviated table. The values are printed with no decimal places, and in a field-width of 3.

The RANGE parameter contains a list of scalars, one for each variable in the DATA list. This allows you to check that the values of each variable lie within the given range. The range is also used to standardize quantitative variates, so that you can impose a standard range for example when variates are measured on commensurate scales. You can omit the RANGE parameter for all or any of the variables by giving a missing identifier or a scalar with a missing value; Genstat then uses the observed range.

The UNITS option allows you to change the labelling of the units in the table; you can specify a text or a pointer or a variate.

You can use the GROUPS option to specify a factor that will split the units into groups. The table from HLIST is then divided into sections corresponding to the groups. If the factor has labels, these are used to annotate the sections; otherwise a group number is used.

Options: GROUPS, UNITS.

Parameters: DATA, TEST, RANGE.

Action with `RESTRICT`

You can restrict any of the DATA variates or factors to list only a subset of the units. If more than one of these is restricted, then they must all be restricted to the same set of units.

Example

" Genstat example HCLU-1: Cluster analysis

   Data from 'Observers Book of Automobiles', 1986
   16 Italian cars and 10 measurements:
   1.  engine capacity        c.c.        CC
   2.  number of cylinders                NCyl
   3.  fuel tank              litres      Tank
   4.  unladen weight         kg          Wt
   5.  length                 cm          Length
   6.  width                  cm          Width
   7.  height                 cm          Ht
   8.  wheelbase              cm          Wbase
   9.  top speed              kph         TSpeed
  10.  time to 100kph         secs        StSt
  11.  carburettor/inj/diesel 1/2/3       Carb
  12.  front/rear wheel drive 1/2         Drive
"

TEXT [VALUES=Estate,'Arna1.5','Alfa2.5',Mondialqc,Testarossa,Croma,\ 
  Panda,Regatta,Regattad,Uno,X19,Contach,Delta,Thema,Y10,Spider] Cars
POINTER [VALUES=CC,NCyl,Tank,Wt,Length,Width,Ht,WBase,TSpeed,StSt,\ 
  Carb,Drive] Vars
" Read the data - measurements and carnames - from the file
 'HCLU-1.DAT', and then display it."
OPEN '%gendir%/examples/HCLU-1.DAT'; CHANNEL=cardat
READ [CHANNEL=cardat] Vars[]
CLOSE cardat

" Treat the number of cylinders, data[2], differently to the 
  continuous measurements."
HLIST [UNITS=Cars] \
  Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching)

" Form a hierarchical clustering of the cars,
  using the single linkage method."
SYMMETRIC [ROWS=Cars] CarSim
FSIMILARITY [SIMILARITY=CarSim]\ 
  Vars[]; TEST=4(cityblock,euclidean),2(cityblock,simplematching)
HCLUSTER [PRINT=amalgamations; METHOD=single] CarSim

" Use the average-linkage method."
HCLUSTER [PRINT=dendrogram; METHOD=average] CarSim;\ 
  AMALGAMATIONS=Am; PERMUTATION=Perm

" Display a high-resolution dendrogram."
DDENDROGRAM [ORDERING=given] DATA=Am; PERMUTATION=Perm; LABELS=Cars;\ 
  TITLE='Italian cars clustered by average linkage'

Updated on June 19, 2019

Was this article helpful?

Yes No