1. Home
  2. BOXPLOT procedure

BOXPLOT procedure

Draws box-and-whisker diagrams or schematic plots (P.W. Lane & S.D. Langton).

Options

GRAPHICS = string token What type of graphics to use (highresolution, lineprinter); default high
TITLE = text Title for diagram; default *
AXISTITLE = text Title for axis representing data values; default *
WINDOW = scalar Window in which to draw a high-resolution plot; default 4
SCREEN = string token Whether to clear screen before a high-resolution plot (clear, keep); default clea
ORIENTATION = string token Orientation of plots (horizontal, vertical, across, down); default vert
YORIENTATION = string token Direction of the y-axis for horizontal plots (reverse, normal); default reve
METHOD = string token Type of representation of data in a high-resolution plot (boxandwhisker, schematic); default boxa
SCREEN = string token Whether to clear screen before a high-resolution plot (clear, keep); default clea
BOXTITLE = text Title for axis representing different variates or groups; default *
BOXWIDTH = string token Whether to relate box width to size of sample in high-resolution plot (fixed, variable); default fixe
WHISKER = number Linestyle for whiskers (0…10); default 1
BAR% = scalar Size of bar at the end of the whiskers, as a percentage of the box-width; default 0 (i.e. no bar)
WIDTH% = scalar Width of the boxes, expressed as a percentage of the default width; default 100
SEM = string token Add bar showing a nonparametric standard error of the median (yes, no) default no
BOXORDER = string token Sort order for boxes when there are several DATA variates and GROUPS (groups, variates); default vari
REFERENCELINE = scalar Specifies the position of a reference line to be drawn parallel to the box axis; default * i.e. none

Parameters

DATA = variates Data to be summarized; no default
GROUPS = factor Factor to divide values of a single variate into groups; default *
BOXLABELS = texts Labels for individual boxes; default *, i.e. identifiers of variates or labels or levels of factor
UNITLABELS = texts Labels for extreme points in schematic plot; default is to use unit labels
BOXPOSITIONS = variates Positions of the boxes on the appropriate axis; default defines positions in an equal spacing

Description

BOXPLOT draws pictures to display the distribution of one or more sets of data. In the simplest case, with the DATA parameter set to a single variate, BOXPLOT will draw a box-and-whisker diagram, as defined by Tukey (1977). The box spans the interquartile range of the values in the variate, so that the middle 50% of the data lie within the box, with a line indicating the median. Whiskers extend beyond the ends of the box as far as the minimum and maximum values. If several variates are supplied, a box is drawn for each of them using the same scale. Alternatively, if a single variate is supplied by the DATA parameter, a factor with the same number of values as the variate may be provided by the GROUPS parameter, and a box will be drawn for each level of the factor. If you specify several DATA variates, and GROUPS factors, the BOXORDER option controls whether the boxes are arranged as groups within variates (BOXORDER=variates, the default) of variates within groups (BOXORDER=groups).

The GRAPHICS option indicates whether high-resolution or line-printer plots are required. The TITLE, AXISTITLE and BOXTITLE options can be set to specify the titles displayed at the top of the plot, along the axis representing the data values, and along the axis representing separate boxes when there are several variates or groups, for either graphics mode. For high-resolution plots, the WINDOW and SCREEN options control the placement of the picture in the graphical frame.

The ORIENTATION option controls the orientation of the boxes, with the following settings:

    vertical plots the boxes vertically i.e. down the screen (default),
    horizontal plots the boxes horizontally i.e. across the screen,
    down synonym of vertical, and
    across synonym of horizontal.

It is not possible to produce line-printer plots with more than 14 boxes. If the page size is small, as in interactive mode, vertical line-printer plots may be very cramped: the PAGE option of the OUTPUT directive can be used to increase the depth of the graphs.

The YORIENTATION option controls the orientation of the y-axis when the boxes are plotted horizontally. By default this is reversed, so that the first box is at the top of the screen.

Schematic plots can be drawn (high-resolution only) by setting option METHOD=schematic. These diagrams (also defined by Tukey 1977) are modifications of box-and-whisker diagrams which display individual outlying points as well as the box. The whiskers extend only to the most extreme data values within the inner “fences”, which are at a distance of 1.5 times the interquartile range beyond the quartiles, or the maximum value if that is smaller. Individual outliers are plotted with a cross by default, with labels specified by the UNITLABELS parameter. The default for UNITLABELS is to use the unit labels of the DATA variate. The labels can be suppressed by setting option UNITLABELS=*. “Far” outliers, beyond the outer “fences” which are at a distance of three times the interquartile range beyond the quartiles, are plotted with a different pen.

The SEM option adds a central bar to each boxplot, giving a nonparameteric estimate of the standard error of the median. This is calculated as the distance between the quartiles, multiplied by 1.5, and divided by the square root of the number of values in the DATA variate.

By default, all boxes have equal width. High-resolution diagrams can be modified to indicate the number of values being represented by each box. The option BOXWIDTH=variable will scale the box widths by the square root of the number of values represented.

The style of the whiskers can be controlled by setting the WHISKER option to a graphical linestyle in the range 0 to 10. These styles are device dependent, but 0 and 1 always give a solid line (the default) and 2 usually gives a dashed line. The BAR% option allows you to add bars at the end of the whiskers. For example, the setting 100 gives a bar as wide as the box, and 25 would give one a quarter the width. The default is 0, giving no bars. The WIDTH% option specifies the width of the boxes, as a percentage of the default width (default 100).

The REFERENCELINE option allows you to specify the position of a reference line to be drawn parallel to the box axis in a high-resolution plot. If this is not set, no line is drawn.

Six pens are used to draw the high-resolution displays, apart from the axes: pen 1 for the boxes and median line (default colour black), pen 2 for far outliers (red crosses), pen 3 for outliers (green crosses) and pen 4 for the whiskers (set to match the colour of pen 1), pen 5 for the standard error of median bar, and pen 6 for the reference line. You can customize the pictures by setting some aspects of these pens with the PEN directive before calling the procedure: in particular, the colours, symbols and line-thicknesses.

The BOXLABELS parameter allows you to specify labels that will identify each box.

The UNITLABELS parameter allows you to specify labels that will be used to identify outlying observations in schematic plots (but this is not available if you gave a list of variates in the DATA parameter).

The BOXPOSITIONS parameter defines the positions of the boxes on the appropriate axis. If this is unset, the positions are defined with an equal spacing.

Options: GRAPHICS, TITLE, AXISTITLE, WINDOW, ORIENTATION, YORIENTATION, METHOD, SCREEN, BOXTITLE, BOXWIDTH, WHISKER, BAR%, WIDTH%, SEM, BOXORDER, REFERENCELINE.
Parameters: DATA, GROUPS, BOXLABELS, UNITLABELS, BOXPOSITION.

Method

The medians and extremes are calculated by functions MEDIAN, MINIMUM and MAXIMUM, whereas the quartiles are calculated using the PERCENT option of TABULATE.

Action with RESTRICT

Restrictions on the supplied variates are taken into account. The grouping factor and texts holding boxlabels or unitlabels, if specified, should not be restricted.

Reference

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley.

See also

Procedures: DOTHISTOGRAM, RUGPLOT, STEM, DMOSAIC, DXDENSITY.
Commands for: Graphics.

Example

CAPTION  'BOXPLOT example',\
         !t('1) The three variates America, Asia_Oc, Other contain the',\
         'heights of volcanoes in three regions of the world.',\
         'Simple box-and-whisker plots are drawn to compare the',\
         'distribution of heights in each group.'); STYLE=meta,plain
VARIATE  America,Asia_Oc,Other; VALUES=!(199,197,193,185,177,172,157,156,140,\
         140,130,126,124,124,113,102,100,102,94,93,89,86,83,83,83,82,77,73,\
         70,62,58,51,51,42,40,34,36,67,67,66,60,57,57,53,49,43,43,40,35,35),\
         !(156,137,125,122,120,112,109,103,100,100,96,95,95,90,83,81,81,81,\
         77,75,75,73,71,71,67,66,66,64,62,60,60,60,59,58,57,56,56,55,54,54,\
         52,52,52,51,50,49,49,48,45,44,44,37,36,36,26,26,24,19,11,10,41),\
         !(134,125,114,111,100,90,80,75,49,21,21,30,60,17,19)
BOXPLOT  [TITLE='Box-and-whisker diagram'] America,Asia_Oc,Other
CAPTION  !t('2) The three sets of heights are combined into a single',\
         'variate and a factor is set up to specify which values in the',\
         'combined variate came from which region. A schematic plot is',\
         'drawn to compare the heights in each region, with boxwidths',\
         'scaled to indicate the number of volcanoes in each region.')
VARIATE  [VALUES=#America,#Asia_Oc,#Other] All
FACTOR   [LABELS=!t(America,'Asia/Oceania',Elsewhere);\
         VALUES=50(1),61(2),15(3)] Region
BOXPLOT  [TITLE='Schematic plot'; METHOD=schematic; BOXWIDTH=variable]\ 
         All; GROUP=Region
CAPTION  !t('These examples are designed to produce high-resolution',\
         'diagrams. Lineprinter plots are obtained by setting option',\
         'GRAPHICS=line')
Updated on February 7, 2023

Was this article helpful?