Produces histograms of data on the terminal or line printer (synonym of LPHISTOGRAM
).
Options
CHANNEL = scalar |
Channel number of output file; default is the current output file |
---|---|
TITLE = text |
General title; default * |
LIMITS = variate |
Variate of group limits for classifying variates into groups; default * |
NGROUPS = scalar |
When LIMITS is not specified, this defines the number of groups into which a data variate is to be classified; default is the integer value nearest to the square root of the number of values in the variate |
LABELS = text |
Group labels |
SCALE = scalar |
Number of units represented by each character; default 1 |
Parameters
DATA = identifiers |
Data for the histograms; these can be either a factor indicating the group to which each unit belongs, a variate whose values are to be grouped, or a one-way table giving the number of units in each group |
---|---|
NOBSERVATIONS = tables |
One-way table to save numbers in the groups |
GROUPS = factors |
Factor to save groups defined from a variate |
SYMBOLS = texts |
Characters to be used to represent the bars of each histogram |
DESCRIPTION = texts |
Annotation for key |
Description
The HISTOGRAM
directive has been replaced by the LPHISTOGRAM
directive, and may be removed in a future release or modified to produce high-resolution plots instead of character-based plots.
Histograms provide quick and simple visual summaries of data values. The data are divided into several groups, which are then displayed as a histogram consisting of a line of asterisks for each group. The number of asterisks in each line is proportional to the number of values assigned to that group; this figure is also printed at the beginning of each line. The data for the histogram are specified using the DATA
parameter in either variates, factors or one-way tables.
If a histogram is to be formed from a variate, Genstat sorts its values into groups as defined by upper and lower bounds. You can also specify a list of variates, to obtain a parallel histogram. For each group one row of asterisks is printed for each variate, labelled by the corresponding identifier. The variates are sorted according to the same intervals; there is no need for them all to have the same numbers of values.
With variates of data, you can use the NGROUPS
option to specify the number of groups in the histogram; Genstat will then work out appropriate limits, based on the range of the data, to form intervals of equal width. For example:
HISTOGRAM [NGROUPS=5] Data
Alternatively, you can define the groups explicitly, by setting the LIMITS
option to a variate containing the group limits. For example:
VARIATE [VALUES=1,2,3,5,7,8,10] Glimits
HISTOGRAM [LIMITS=Glimits] Data
Glimits
is a variate with seven values, producing a histogram in which the data are split into eight groups; ≤1, 1-2, 2-3, 3-5, 5-7, 7-8, 8-10, >10. The upper limit of each group is included within that group, so the group 3-5, for example, contains values that are greater than 3 and less than or equal to 5. The values of the limits variate are sorted into ascending order if necessary, but the variate itself is not changed.
You can use the LABELS
option to provide your own labelling for the groups of the histogram. It should be set to a text vector of length equal to the number of groups. If neither NGROUPS
nor LIMITS
has been set, the number of groups is determined from the number of values in the LABELS
structure. If LABELS
is also unset, the default number of groups is chosen as the integer value nearest to the square root of the number of values, up to a maximum of 10. Alternatively, procedure AKAIKEHISTOGRAM
provides a more sophisticated method of generating histograms, using Akaike’s Information Criterion (AIC) to generate an optimal grouping of the data.
The data for the histogram can also be specified as a factor (which defines the assignment of each unit to a group of the histogram). Genstat then counts the number of units that occur with each level of the factor; thus the number of groups of the histogram is the number of levels of the factor and the value for each group is the corresponding total. If the LABELS
option is unset, the labels of the factor (if present) are used to label the groups, otherwise Genstat uses the factor levels.
When Genstat plots the histogram of a one-way table, the number of groups is the number of levels of the factor classifying the table and the values of the table indicate the number of observations in each group. If the LABELS
option is unset, the labels or levels of the classifying factor are again used to label the histogram.
When producing a parallel histogram the data structures must all be of the same type: variate, factor or table. If parallel histograms are to be formed from several factors, they must all have the same number of levels, and the labels or levels of the first factor will be used to identify the groups. Likewise, if you are forming parallel histograms from several tables, they must all have the same number of values, and the classifying factor of the first table will define the labelling of the histogram.
The SYMBOLS
parameter can specify alternative plotting characters to be used instead of the asterisk. For example:
HISTOGRAM Variate; SYMBOLS='+'
You can specify a different string for each structure in a parallel histogram. If you specify strings of more than one character, Genstat uses the characters in order, recycled as necessary, until each histogram bar is of the correct length.
The TITLE
option lets you set an overall title for the output, and the DESCRIPTION
parameter can be used to provide a text for labelling the histogram instead of the identifiers of the DATA
structures.
Normally one asterisk will represent one unit. However, if there are many data values and the groups become large, Genstat may not be able to fit enough asterisks into one row. It will then alter the scaling so that one asterisk represents several units. You can set the scaling explicitly using the SCALE
option; the value specified is rounded to the nearest integer, and determines how many units should be represented by each asterisk.
HISTOGRAM
has two output parameters that allow you to save information that has been generated during formation of the histogram. The NOBSERVATIONS
parameter allows you to save a one-way table of counts that contains the number of observations that were assigned to each group; the missing-value cell of this table will contain a count of the number of units that were missing and that therefore remain unclassified. When producing a histogram from a variate, you can use the GROUPS
parameter to specify a factor to record the group to which each unit was allocated.
Normally, output goes to the current output channel, but you can use the CHANNEL
option to direct it to another. For example, when you are working interactively, you might want to send a graph to a secondary output file so that you can print it later. Unlike some directives (for example, PRINT
) you cannot save the output in a text structure.
Options: CHANNEL
, TITLE
, LIMITS
, NGROUPS
, LABELS
, SCALE
.
Parameters: DATA
, NOBSERVATIONS
, GROUPS
, SYMBOLS
, DESCRIPTION
.
Action with RESTRICT
You can restrict a DATA
variate or factor to form a histogram for only a subset of the units. However, the restriction does not carry over to any other variates or factors listed by the DATA
parameter.
See also
Directives: BARCHART
, DHISTOGRAM
, LPHISTOGRAM
.
Commands for: Graphics.
Example
" Example HIST-1; line-printer histogram" VARIATE [NVALUES=25] Data READ Data 1 1 1 2 2 1 4 3 6 5 4 2 6 8 0 9 7 4 3 2 4 6 9 4 5: HISTOGRAM Data " Parallel histograms" VARIATE [NVALUES=30] X,Y TEXT [VALUES=small,medium,large] Size READ X,Y 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 6 6 6 6 2 2 2 2 3 1 4 4 5 2 1 7 8 0 1 3 3 2 1 1 1 0 5 3 6 5 7 7 9 1: HISTOGRAM [LABELS=Size] Y,X " Specify symbols" HISTOGRAM Data; SYMBOLS='X-O-' " Set limits" VARIATE [VALUES=1,2,3,5,7,8,10] Uplimits HISTOGRAM [LIMITS=Uplimits] Data; NOBSERVATIONS=Counts " Bar-chart of nominal groups" TEXT [VALUES=apple,banana,peach,cherry,pear,orange] Name FACTOR [LEVELS=6; LABELS=Name; NVALUES=32] Fruit READ Fruit 4 6 3 4 1 6 3 3 6 4 3 5 3 5 1 5 1 6 3 4 4 5 5 5 6 3 6 2 2 5 5 3: HISTOGRAM Fruit " Histogram of contents os one-way table (formed above)" PRINT [ACROSS=2] Counts; FIELDWIDTH=8 HISTOGRAM Counts " Series of histograms" CALCULATE Var[1...3] = URAND(1237,0,0; 30) & Var[1...3] = 10,11,12 + NED(Var[1...3]) * 1,1.2,1.3 VARIATE [VALUES=6,9,12,15] Limits FOR V1=Var[] HISTOGRAM V1 HISTOGRAM [LIMITS=Limits] V1 ENDFOR