HISTOGRAM directive

Produces histograms of data on the terminal or line printer (synonym of LPHISTOGRAM).

Options

`CHANNEL` = scalar	Channel number of output file; default is the current output file
`TITLE` = text	General title; default `*`
`LIMITS` = variate	Variate of group limits for classifying variates into groups; default `*`
`NGROUPS` = scalar	When `LIMITS` is not specified, this defines the number of groups into which a data variate is to be classified; default is the integer value nearest to the square root of the number of values in the variate
`LABELS` = text	Group labels
`SCALE` = scalar	Number of units represented by each character; default 1

Parameters

`DATA` = identifiers	Data for the histograms; these can be either a factor indicating the group to which each unit belongs, a variate whose values are to be grouped, or a one-way table giving the number of units in each group
`NOBSERVATIONS` = tables	One-way table to save numbers in the groups
`GROUPS` = factors	Factor to save groups defined from a variate
`SYMBOLS` = texts	Characters to be used to represent the bars of each histogram
`DESCRIPTION` = texts	Annotation for key

Description

The HISTOGRAM directive has been replaced by the LPHISTOGRAM directive, and may be removed in a future release or modified to produce high-resolution plots instead of character-based plots.

Histograms provide quick and simple visual summaries of data values. The data are divided into several groups, which are then displayed as a histogram consisting of a line of asterisks for each group. The number of asterisks in each line is proportional to the number of values assigned to that group; this figure is also printed at the beginning of each line. The data for the histogram are specified using the DATA parameter in either variates, factors or one-way tables.

If a histogram is to be formed from a variate, Genstat sorts its values into groups as defined by upper and lower bounds. You can also specify a list of variates, to obtain a parallel histogram. For each group one row of asterisks is printed for each variate, labelled by the corresponding identifier. The variates are sorted according to the same intervals; there is no need for them all to have the same numbers of values.

With variates of data, you can use the NGROUPS option to specify the number of groups in the histogram; Genstat will then work out appropriate limits, based on the range of the data, to form intervals of equal width. For example:

HISTOGRAM [NGROUPS=5] Data

Alternatively, you can define the groups explicitly, by setting the LIMITS option to a variate containing the group limits. For example:

VARIATE [VALUES=1,2,3,5,7,8,10] Glimits

HISTOGRAM [LIMITS=Glimits] Data

Glimits is a variate with seven values, producing a histogram in which the data are split into eight groups; ≤1, 1-2, 2-3, 3-5, 5-7, 7-8, 8-10, >10. The upper limit of each group is included within that group, so the group 3-5, for example, contains values that are greater than 3 and less than or equal to 5. The values of the limits variate are sorted into ascending order if necessary, but the variate itself is not changed.

You can use the LABELS option to provide your own labelling for the groups of the histogram. It should be set to a text vector of length equal to the number of groups. If neither NGROUPS nor LIMITS has been set, the number of groups is determined from the number of values in the LABELS structure. If LABELS is also unset, the default number of groups is chosen as the integer value nearest to the square root of the number of values, up to a maximum of 10. Alternatively, procedure AKAIKEHISTOGRAM provides a more sophisticated method of generating histograms, using Akaike’s Information Criterion (AIC) to generate an optimal grouping of the data.

The data for the histogram can also be specified as a factor (which defines the assignment of each unit to a group of the histogram). Genstat then counts the number of units that occur with each level of the factor; thus the number of groups of the histogram is the number of levels of the factor and the value for each group is the corresponding total. If the LABELS option is unset, the labels of the factor (if present) are used to label the groups, otherwise Genstat uses the factor levels.

When Genstat plots the histogram of a one-way table, the number of groups is the number of levels of the factor classifying the table and the values of the table indicate the number of observations in each group. If the LABELS option is unset, the labels or levels of the classifying factor are again used to label the histogram.

When producing a parallel histogram the data structures must all be of the same type: variate, factor or table. If parallel histograms are to be formed from several factors, they must all have the same number of levels, and the labels or levels of the first factor will be used to identify the groups. Likewise, if you are forming parallel histograms from several tables, they must all have the same number of values, and the classifying factor of the first table will define the labelling of the histogram.

The SYMBOLS parameter can specify alternative plotting characters to be used instead of the asterisk. For example:

HISTOGRAM Variate; SYMBOLS='+'

You can specify a different string for each structure in a parallel histogram. If you specify strings of more than one character, Genstat uses the characters in order, recycled as necessary, until each histogram bar is of the correct length.

The TITLE option lets you set an overall title for the output, and the DESCRIPTION parameter can be used to provide a text for labelling the histogram instead of the identifiers of the DATA structures.

Normally one asterisk will represent one unit. However, if there are many data values and the groups become large, Genstat may not be able to fit enough asterisks into one row. It will then alter the scaling so that one asterisk represents several units. You can set the scaling explicitly using the SCALE option; the value specified is rounded to the nearest integer, and determines how many units should be represented by each asterisk.

HISTOGRAM has two output parameters that allow you to save information that has been generated during formation of the histogram. The NOBSERVATIONS parameter allows you to save a one-way table of counts that contains the number of observations that were assigned to each group; the missing-value cell of this table will contain a count of the number of units that were missing and that therefore remain unclassified. When producing a histogram from a variate, you can use the GROUPS parameter to specify a factor to record the group to which each unit was allocated.

Normally, output goes to the current output channel, but you can use the CHANNEL option to direct it to another. For example, when you are working interactively, you might want to send a graph to a secondary output file so that you can print it later. Unlike some directives (for example, PRINT) you cannot save the output in a text structure.

Options: CHANNEL, TITLE, LIMITS, NGROUPS, LABELS, SCALE.

Parameters: DATA, NOBSERVATIONS, GROUPS, SYMBOLS, DESCRIPTION.

Action with `RESTRICT`

You can restrict a DATA variate or factor to form a histogram for only a subset of the units. However, the restriction does not carry over to any other variates or factors listed by the DATA parameter.

Example

" Example HIST-1; line-printer histogram"

VARIATE [NVALUES=25] Data
READ Data
1 1 1 2 2 1 4 3 6 5 4 2 6 8 0 9 7 4 3 2 4 6 9 4 5:
HISTOGRAM Data

" Parallel histograms"
VARIATE [NVALUES=30] X,Y
TEXT [VALUES=small,medium,large] Size
READ X,Y
1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 6 6 6
2 2 2 2 3 1 4 4 5 2 1 7 8 0 1 3 3 2 1 1 1 0 5 3 6 5 7 7 9 1:
HISTOGRAM [LABELS=Size] Y,X

" Specify symbols"
HISTOGRAM Data; SYMBOLS='X-O-'

" Set limits"
VARIATE [VALUES=1,2,3,5,7,8,10] Uplimits
HISTOGRAM [LIMITS=Uplimits] Data; NOBSERVATIONS=Counts

" Bar-chart of nominal groups"
TEXT [VALUES=apple,banana,peach,cherry,pear,orange] Name
FACTOR [LEVELS=6; LABELS=Name; NVALUES=32] Fruit
READ Fruit
4 6 3 4 1 6 3 3 6 4 3 5 3 5 1 5 1 6 3 4 4 5 5 5 6 3 6 2 2 5 5 3:
HISTOGRAM Fruit

" Histogram of contents os one-way table (formed above)"
PRINT [ACROSS=2] Counts; FIELDWIDTH=8
HISTOGRAM Counts

" Series of histograms"
CALCULATE Var[1...3] = URAND(1237,0,0; 30)
& Var[1...3] = 10,11,12 + NED(Var[1...3]) * 1,1.2,1.3
VARIATE [VALUES=6,9,12,15] Limits
FOR V1=Var[]
  HISTOGRAM V1
  HISTOGRAM [LIMITS=Limits] V1
ENDFOR

Updated on March 7, 2019

Was this article helpful?

Yes No