Produces histograms of data on the terminal or line printer (synonym of
||Channel number of output file; default is the current output file|
||General title; default
||Variate of group limits for classifying variates into groups; default
||Number of units represented by each character; default 1|
||Data for the histograms; these can be either a factor indicating the group to which each unit belongs, a variate whose values are to be grouped, or a one-way table giving the number of units in each group|
||One-way table to save numbers in the groups|
||Factor to save groups defined from a variate|
||Characters to be used to represent the bars of each histogram|
||Annotation for key|
HISTOGRAM directive has been replaced by the
LPHISTOGRAM directive, and may be removed in a future release or modified to produce high-resolution plots instead of character-based plots.
Histograms provide quick and simple visual summaries of data values. The data are divided into several groups, which are then displayed as a histogram consisting of a line of asterisks for each group. The number of asterisks in each line is proportional to the number of values assigned to that group; this figure is also printed at the beginning of each line. The data for the histogram are specified using the
DATA parameter in either variates, factors or one-way tables.
If a histogram is to be formed from a variate, Genstat sorts its values into groups as defined by upper and lower bounds. You can also specify a list of variates, to obtain a parallel histogram. For each group one row of asterisks is printed for each variate, labelled by the corresponding identifier. The variates are sorted according to the same intervals; there is no need for them all to have the same numbers of values.
With variates of data, you can use the
NGROUPS option to specify the number of groups in the histogram; Genstat will then work out appropriate limits, based on the range of the data, to form intervals of equal width. For example:
HISTOGRAM [NGROUPS=5] Data
Alternatively, you can define the groups explicitly, by setting the
LIMITS option to a variate containing the group limits. For example:
VARIATE [VALUES=1,2,3,5,7,8,10] Glimits
HISTOGRAM [LIMITS=Glimits] Data
Glimits is a variate with seven values, producing a histogram in which the data are split into eight groups; ≤1, 1-2, 2-3, 3-5, 5-7, 7-8, 8-10, >10. The upper limit of each group is included within that group, so the group 3-5, for example, contains values that are greater than 3 and less than or equal to 5. The values of the limits variate are sorted into ascending order if necessary, but the variate itself is not changed.
You can use the
LABELS option to provide your own labelling for the groups of the histogram. It should be set to a text vector of length equal to the number of groups. If neither
LIMITS has been set, the number of groups is determined from the number of values in the
LABELS structure. If
LABELS is also unset, the default number of groups is chosen as the integer value nearest to the square root of the number of values, up to a maximum of 10. Alternatively, procedure
AKAIKEHISTOGRAM provides a more sophisticated method of generating histograms, using Akaike’s Information Criterion (AIC) to generate an optimal grouping of the data.
The data for the histogram can also be specified as a factor (which defines the assignment of each unit to a group of the histogram). Genstat then counts the number of units that occur with each level of the factor; thus the number of groups of the histogram is the number of levels of the factor and the value for each group is the corresponding total. If the
LABELS option is unset, the labels of the factor (if present) are used to label the groups, otherwise Genstat uses the factor levels.
When Genstat plots the histogram of a one-way table, the number of groups is the number of levels of the factor classifying the table and the values of the table indicate the number of observations in each group. If the
LABELS option is unset, the labels or levels of the classifying factor are again used to label the histogram.
When producing a parallel histogram the data structures must all be of the same type: variate, factor or table. If parallel histograms are to be formed from several factors, they must all have the same number of levels, and the labels or levels of the first factor will be used to identify the groups. Likewise, if you are forming parallel histograms from several tables, they must all have the same number of values, and the classifying factor of the first table will define the labelling of the histogram.
SYMBOLS parameter can specify alternative plotting characters to be used instead of the asterisk. For example:
HISTOGRAM Variate; SYMBOLS='+'
You can specify a different string for each structure in a parallel histogram. If you specify strings of more than one character, Genstat uses the characters in order, recycled as necessary, until each histogram bar is of the correct length.
TITLE option lets you set an overall title for the output, and the
DESCRIPTION parameter can be used to provide a text for labelling the histogram instead of the identifiers of the
Normally one asterisk will represent one unit. However, if there are many data values and the groups become large, Genstat may not be able to fit enough asterisks into one row. It will then alter the scaling so that one asterisk represents several units. You can set the scaling explicitly using the
SCALE option; the value specified is rounded to the nearest integer, and determines how many units should be represented by each asterisk.
HISTOGRAM has two output parameters that allow you to save information that has been generated during formation of the histogram. The
NOBSERVATIONS parameter allows you to save a one-way table of counts that contains the number of observations that were assigned to each group; the missing-value cell of this table will contain a count of the number of units that were missing and that therefore remain unclassified. When producing a histogram from a variate, you can use the
GROUPS parameter to specify a factor to record the group to which each unit was allocated.
Normally, output goes to the current output channel, but you can use the
CHANNEL option to direct it to another. For example, when you are working interactively, you might want to send a graph to a secondary output file so that you can print it later. Unlike some directives (for example,
You can restrict a
DATA variate or factor to form a histogram for only a subset of the units. However, the restriction does not carry over to any other variates or factors listed by the
Commands for: Graphics.
" Example HIST-1; line-printer histogram" VARIATE [NVALUES=25] Data READ Data 1 1 1 2 2 1 4 3 6 5 4 2 6 8 0 9 7 4 3 2 4 6 9 4 5: HISTOGRAM Data " Parallel histograms" VARIATE [NVALUES=30] X,Y TEXT [VALUES=small,medium,large] Size READ X,Y 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 6 6 6 2 2 2 2 3 1 4 4 5 2 1 7 8 0 1 3 3 2 1 1 1 0 5 3 6 5 7 7 9 1: HISTOGRAM [LABELS=Size] Y,X " Specify symbols" HISTOGRAM Data; SYMBOLS='X-O-' " Set limits" VARIATE [VALUES=1,2,3,5,7,8,10] Uplimits HISTOGRAM [LIMITS=Uplimits] Data; NOBSERVATIONS=Counts " Bar-chart of nominal groups" TEXT [VALUES=apple,banana,peach,cherry,pear,orange] Name FACTOR [LEVELS=6; LABELS=Name; NVALUES=32] Fruit READ Fruit 4 6 3 4 1 6 3 3 6 4 3 5 3 5 1 5 1 6 3 4 4 5 5 5 6 3 6 2 2 5 5 3: HISTOGRAM Fruit " Histogram of contents os one-way table (formed above)" PRINT [ACROSS=2] Counts; FIELDWIDTH=8 HISTOGRAM Counts " Series of histograms" CALCULATE Var[1...3] = URAND(1237,0,0; 30) & Var[1...3] = 10,11,12 + NED(Var[1...3]) * 1,1.2,1.3 VARIATE [VALUES=6,9,12,15] Limits FOR V1=Var HISTOGRAM V1 HISTOGRAM [LIMITS=Limits] V1 ENDFOR