Prints histograms with improved definition of groups (A. Keen).

### Options

`CHANNEL` = scalar |
Channel number of output file; default is the current output file |
---|---|

`TITLE` = text |
General title; default ‘`Histogram of ...` ‘, where `...` is the identifier of the structure specified by `DATA` |

`LOWER` = scalar |
Lowest class limit |

`WIDTH` = scalar |
Interval width |

`SCALE` = scalar |
Number of units represented by each symbol; default 1 (or more if the page width is not sufficient) |

### Parameters

`DATA` = identifiers |
Data for the histograms (variate, table, factor or matrix) |
---|---|

`NOBSERVATIONS` = tables |
One-way table to save numbers in the groups |

`GROUPS` = factors |
Factor to save groups defined, with `LEVELS` the midpoints of the intervals and `LABELS` as `LEVELS` , but as text-vector |

`SYMBOLS` = texts |
Characters to be used to represent the bars of each histogram |

`DESCRIPTION` = texts |
Annotation for key |

### Description

The procedure `AKAIKEHISTOGRAM`

has been designed as an alternative for the Genstat directive `HISTOGRAM`

, for cases where the default settings are not optimal. Such cases may arise due to the following disadvantages of `HISTOGRAM`

:

– `HISTOGRAM`

does not take into account the round-off of the data. The round-off defines a minimal interval width, say *dy*, for the observations. A sensible interval width must be a multiple of *dy*, because otherwise the actual width is not equal for all intervals. An extreme example of this is the case where the interval width is smaller than *dy*; this causes artificial “holes” in the histogram.

– The default number of groups equals the square root of the number of observations, irrespective of the shape of the distribution. In some situations (for instance if the number of observations is large) the number of groups is unnecessarily large; in other situations (for instance if the shape of the distribution is complex) the number of groups can be too small. If the number of groups is too large, then differences in numbers of observations between neighbouring classes may be just random fluctuations, while if the number of groups is too small, valuable information is lost.

– The specification of own class limits (in a variate) can be rather cumbersome, especially if many histograms have to be produced.

`AKAIKEHISTOGRAM`

aims to avoid these disadvantages of `HISTOGRAM`

. By default an “optimal” number of groups is determined using Akaike’s Information Criterion.

Alternatively, own class limits can be specified using options `LOWER`

and `WIDTH`

instead of the option `LIMITS`

of `HISTOGRAM`

. In a `FOR`

loop different values for the lower limit and/or for the interval width can be specified for different quantitative structures. Scalars with missing values can be used to specify default values for these options. Option `LOWER`

is especially important if the observations have a “natural” lower limit, for example the value 0; then 0 is taken as the lower limit of the first group and the first group has the same interval width as the following groups.

The option `TITLE`

and the parameters of `HISTOGRAM`

have been transferred to `AKAIKEHISTOGRAM`

. However, options `NGROUPS`

and `LABELS`

from `HISTOGRAM`

have been omitted, because they are not in line with the style of `AKAIKEHISTOGRAM`

.

Options: `CHANNEL`

, `TITLE`

, `LOWER`

, `WIDTH`

, `SCALE`

.

Parameters: `DATA`

, `NOBSERVATIONS`

, `GROUPS`

, `SYMBOLS`

, `DESCRIPTION`

.

### Method

The optimality criterion used is Akaike’s Information Criterion (AIC), which is twice the number of free parameters of the model (that is, the number of groups minus 1) minus the maximal log likelihood of the observations under the multinomial model. The starting histogram is a histogram with equal length intervals and more than sufficient groups. From this histogram, new histograms are derived with interval length *r* times the interval length of the starting histogram, *r* = 2 … etc. The “optimal” histogram is the one with minimal AIC. The basic idea for the method is obtained from Sakamoto, Ishiguro & Kitagawa (1986); also see Taylor (1987).

The starting histogram is obtained as follows. First the range of the observations is divided into five equal length intervals from which the apparent number of observations *Na* is calculated as five times the number of observations in the interval with the largest frequency. *Na* is then used as the number of observations instead of the true number, and the number of groups *Ng* is calculated as five times the number obtained from Sturgess’ formula (see, for example, Sakamoto, Ishiguro & Kitagawa (1986), page 117.):

*Ng* = 5 × ( 1 + log_{10}( *Na*/2 ))

The final limits of the starting histogram are obtained by a relatively strong rounding-off of the class limits (comparable with that in `HISTOGRAM`

), where the width is always a multiple of the rounding-off interval.

### Action with `RESTRICT`

The structures in `DATA`

can be restricted, and in different ways; `AKAIKEHISTOGRAM`

uses only those units that are not excluded by their respective restrictions.

### References

Sakamoto, Y., Ishiguro, M & Kitagawa, G. (1986). *Akaike Information Statistics*. D. Reidel Publishing Company. Dordrecht.

Taylor, C.C., (1987). Akaike’s Information Criterion and the Histogram. *Biometrika*, 74, 636-639.

### See also

Directives: `DHISTOGRAM`

, `LPHISTOGRAM`

.

### Example

CAPTION 'AKAIKEHISTOGRAM example',\ !t('The first example illustrates what can go wrong if the',\ 'class-width is not a multiple of the round-off interval.');\ STYLE=meta,plain VARIATE [NVALUES=436] Cadmium READ Cadmium .03 .02 .06 .03 .04 .04 .03 .04 .03 .05 .04 .03 .04 .03 .03 .02 .03 .05 .03 .04 .04 .03 .06 .05 .03 .04 .02 .04 .04 .05 .02 .05 .04 .04 .03 .04 .03 .06 .05 .04 .06 .03 .05 .08 .09 .08 .08 .09 .05 .03 .04 .04 .04 .03 .08 .11 .02 .04 .02 .03 .05 .04 .03 .02 .02 .02 .03 .04 .04 .03 .03 .07 .04 .06 .06 .05 .03 .05 .06 .04 .03 .07 .07 .07 .07 .06 .03 .04 .04 .05 .03 .08 .02 .04 .03 .06 .07 .07 .04 .05 .07 .04 .09 .10 .05 .05 .05 .06 .05 .08 .04 .03 .03 .02 .03 .04 .07 .02 .05 .13 .03 .06 .03 .08 .07 .07 .07 .05 .05 .03 .05 .06 .06 .03 .05 .04 .04 .03 .03 .02 .03 .01 .02 .03 .03 .04 .09 .04 .05 .15 .09 .07 .04 .05 .04 .06 .03 .07 .04 .05 .06 .03 .06 .05 .02 .02 .03 .02 .05 .04 .05 .05 .08 .07 .08 .06 .02 .04 .05 .08 .07 .04 .02 .03 .03 .05 .03 .03 .05 .04 .06 .06 .08 .08 .06 .06 .04 .07 .06 .02 .08 .10 .08 .06 .05 .11 .05 .06 .04 .07 .08 .07 .08 .07 .07 .08 .05 .04 .04 .07 .08 .03 .07 .10 .09 .05 .07 .05 .06 .07 .07 .07 .06 .19 .13 .09 .05 .13 .12 .04 .07 .04 .03 .06 .07 .06 .03 .06 .06 .06 .06 .05 .07 .05 .06 .04 .07 .06 .06 .03 .05 .04 .06 .05 .09 .07 .04 .05 .09 .17 .23 .19 .05 .04 .05 .11 .05 .06 .06 .08 .04 .09 .16 .06 .05 .09 .05 .06 .06 .05 .04 .04 .05 .09 .05 .08 .04 .05 .04 .05 .06 .08 .04 .03 .06 .05 .11 .06 .05 .05 .09 .05 .04 .05 .04 .05 .04 .08 .04 .02 .02 .03 .02 .02 .03 .08 .04 .03 .03 .04 .04 .04 .02 .05 .09 .09 .09 .09 .08 .06 .07 .05 .08 .08 .06 .06 .08 .08 .09 .08 .07 .07 .08 .06 .09 .08 .09 .09 .09 .08 .08 .07 .06 .07 .07 .08 .07 .05 .05 .06 .09 .04 .06 .05 .07 .07 .04 .03 .03 .03 .03 .02 .02 .05 .07 .06 .04 .05 .05 .03 .03 .03 .06 .05 .06 .06 .06 .04 .06 .06 .06 .06 .06 .07 .05 .06 .06 .07 .08 .06 .06 .06 .06 .08 .03 .04 .06 .05 .02 .05 .04 .06 .05 .04 .05 .04 .04 .03 .03 .06 .04 .05 .02 .08 .06 .04 : HISTOGRAM Cadmium AKAIKEHISTOGRAM Cadmium CAPTION !t('The second example illustrates similarity and differences',\ 'between HISTOGRAM and AKAIKEHISTOGRAM with respect to the',\ 'number of groups, class-limits and representation for a random',\ 'sample of size 1000 from the standard normal distribution and',\ 'the halfnormal distribution.') VARIATE [NVAL= 1000] y CALCULATE y= NED(URAND( 34761; 1000)) HISTOGRAM y ; SYMB= '.' AKAIKEHISTOGRAM y ; SYMB= '.' CALCULATE y= ABS( y) HISTOGRAM y ; SYMB= '-' AKAIKEHISTOGRAM y ; SYMB= '-'