Draws box-and-whisker diagrams or schematic plots (P.W. Lane & S.D. Langton).
Options
GRAPHICS = string token |
What type of graphics to use (highresolution , lineprinter ); default high |
---|---|
TITLE = text |
Title for diagram; default * |
AXISTITLE = text |
Title for axis representing data values; default * |
WINDOW = scalar |
Window in which to draw a high-resolution plot; default 4 |
SCREEN = string token |
Whether to clear screen before a high-resolution plot (clear , keep ); default clea |
ORIENTATION = string token |
Orientation of plots (horizontal , vertical , across , down ); default vert |
YORIENTATION = string token |
Direction of the y-axis for horizontal plots (reverse , normal ); default reve |
METHOD = string token |
Type of representation of data in a high-resolution plot (boxandwhisker , schematic ); default boxa |
SCREEN = string token |
Whether to clear screen before a high-resolution plot (clear , keep ); default clea |
BOXTITLE = text |
Title for axis representing different variates or groups; default * |
BOXWIDTH = string token |
Whether to relate box width to size of sample in high-resolution plot (fixed , variable ); default fixe |
WHISKER = number |
Linestyle for whiskers (0…10); default 1 |
BAR% = scalar |
Size of bar at the end of the whiskers, as a percentage of the box-width; default 0 (i.e. no bar) |
WIDTH% = scalar |
Width of the boxes, expressed as a percentage of the default width; default 100 |
SEM = string token |
Add bar showing a nonparametric standard error of the median (yes , no ) default no |
BOXORDER = string token |
Sort order for boxes when there are several DATA variates and GROUPS (groups , variates ); default vari |
REFERENCELINE = scalar |
Specifies the position of a reference line to be drawn parallel to the box axis; default * i.e. none |
Parameters
DATA = variates |
Data to be summarized; no default |
---|---|
GROUPS = factor |
Factor to divide values of a single variate into groups; default * |
BOXLABELS = texts |
Labels for individual boxes; default * , i.e. identifiers of variates or labels or levels of factor |
UNITLABELS = texts |
Labels for extreme points in schematic plot; default is to use unit labels |
BOXPOSITIONS = variates |
Positions of the boxes on the appropriate axis; default defines positions in an equal spacing |
Description
BOXPLOT
draws pictures to display the distribution of one or more sets of data. In the simplest case, with the DATA
parameter set to a single variate, BOXPLOT
will draw a box-and-whisker diagram, as defined by Tukey (1977). The box spans the interquartile range of the values in the variate, so that the middle 50% of the data lie within the box, with a line indicating the median. Whiskers extend beyond the ends of the box as far as the minimum and maximum values. If several variates are supplied, a box is drawn for each of them using the same scale. Alternatively, if a single variate is supplied by the DATA
parameter, a factor with the same number of values as the variate may be provided by the GROUPS
parameter, and a box will be drawn for each level of the factor. If you specify several DATA
variates, and GROUPS
factors, the BOXORDER
option controls whether the boxes are arranged as groups within variates (BOXORDER=variates
, the default) of variates within groups (BOXORDER=groups
).
The GRAPHICS
option indicates whether high-resolution or line-printer plots are required. The TITLE
, AXISTITLE
and BOXTITLE
options can be set to specify the titles displayed at the top of the plot, along the axis representing the data values, and along the axis representing separate boxes when there are several variates or groups, for either graphics mode. For high-resolution plots, the WINDOW
and SCREEN
options control the placement of the picture in the graphical frame.
The ORIENTATION
option controls the orientation of the boxes, with the following settings:
vertical |
plots the boxes vertically i.e. down the screen (default), |
---|---|
horizontal |
plots the boxes horizontally i.e. across the screen, |
down |
synonym of vertical , and |
across |
synonym of horizontal . |
It is not possible to produce line-printer plots with more than 14 boxes. If the page size is small, as in interactive mode, vertical line-printer plots may be very cramped: the PAGE
option of the OUTPUT
directive can be used to increase the depth of the graphs.
The YORIENTATION
option controls the orientation of the y-axis when the boxes are plotted horizontally. By default this is reversed, so that the first box is at the top of the screen.
Schematic plots can be drawn (high-resolution only) by setting option METHOD=schematic
. These diagrams (also defined by Tukey 1977) are modifications of box-and-whisker diagrams which display individual outlying points as well as the box. The whiskers extend only to the most extreme data values within the inner “fences”, which are at a distance of 1.5 times the interquartile range beyond the quartiles, or the maximum value if that is smaller. Individual outliers are plotted with a cross by default, with labels specified by the UNITLABELS
parameter. The default for UNITLABELS
is to use the unit labels of the DATA
variate. The labels can be suppressed by setting option UNITLABELS=*
. “Far” outliers, beyond the outer “fences” which are at a distance of three times the interquartile range beyond the quartiles, are plotted with a different pen.
The SEM
option adds a central bar to each boxplot, giving a nonparameteric estimate of the standard error of the median. This is calculated as the distance between the quartiles, multiplied by 1.5, and divided by the square root of the number of values in the DATA
variate.
By default, all boxes have equal width. High-resolution diagrams can be modified to indicate the number of values being represented by each box. The option BOXWIDTH=variable
will scale the box widths by the square root of the number of values represented.
The style of the whiskers can be controlled by setting the WHISKER
option to a graphical linestyle in the range 0 to 10. These styles are device dependent, but 0 and 1 always give a solid line (the default) and 2 usually gives a dashed line. The BAR%
option allows you to add bars at the end of the whiskers. For example, the setting 100 gives a bar as wide as the box, and 25 would give one a quarter the width. The default is 0, giving no bars. The WIDTH%
option specifies the width of the boxes, as a percentage of the default width (default 100).
The REFERENCELINE
option allows you to specify the position of a reference line to be drawn parallel to the box axis in a high-resolution plot. If this is not set, no line is drawn.
Six pens are used to draw the high-resolution displays, apart from the axes: pen 1 for the boxes and median line (default colour black), pen 2 for far outliers (red crosses), pen 3 for outliers (green crosses) and pen 4 for the whiskers (set to match the colour of pen 1), pen 5 for the standard error of median bar, and pen 6 for the reference line. You can customize the pictures by setting some aspects of these pens with the PEN
directive before calling the procedure: in particular, the colours, symbols and line-thicknesses.
The BOXLABELS
parameter allows you to specify labels that will identify each box.
The UNITLABELS
parameter allows you to specify labels that will be used to identify outlying observations in schematic plots (but this is not available if you gave a list of variates in the DATA
parameter).
The BOXPOSITIONS
parameter defines the positions of the boxes on the appropriate axis. If this is unset, the positions are defined with an equal spacing.
Options: GRAPHICS
, TITLE
, AXISTITLE
, WINDOW
, ORIENTATION
, YORIENTATION
, METHOD
, SCREEN
, BOXTITLE
, BOXWIDTH
, WHISKER
, BAR%
, WIDTH%
, SEM
, BOXORDER
, REFERENCELINE
.
Parameters: DATA
, GROUPS
, BOXLABELS
, UNITLABELS
, BOXPOSITION
.
Method
The medians and extremes are calculated by functions MEDIAN
, MINIMUM
and MAXIMUM
, whereas the quartiles are calculated using the PERCENT
option of TABULATE
.
Action with RESTRICT
Restrictions on the supplied variates are taken into account. The grouping factor and texts holding boxlabels or unitlabels, if specified, should not be restricted.
Reference
Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley.
See also
Procedures: DOTHISTOGRAM
, RUGPLOT
, STEM
, DMOSAIC
, DXDENSITY
.
Commands for: Graphics.
Example
CAPTION 'BOXPLOT example',\ !t('1) The three variates America, Asia_Oc, Other contain the',\ 'heights of volcanoes in three regions of the world.',\ 'Simple box-and-whisker plots are drawn to compare the',\ 'distribution of heights in each group.'); STYLE=meta,plain VARIATE America,Asia_Oc,Other; VALUES=!(199,197,193,185,177,172,157,156,140,\ 140,130,126,124,124,113,102,100,102,94,93,89,86,83,83,83,82,77,73,\ 70,62,58,51,51,42,40,34,36,67,67,66,60,57,57,53,49,43,43,40,35,35),\ !(156,137,125,122,120,112,109,103,100,100,96,95,95,90,83,81,81,81,\ 77,75,75,73,71,71,67,66,66,64,62,60,60,60,59,58,57,56,56,55,54,54,\ 52,52,52,51,50,49,49,48,45,44,44,37,36,36,26,26,24,19,11,10,41),\ !(134,125,114,111,100,90,80,75,49,21,21,30,60,17,19) BOXPLOT [TITLE='Box-and-whisker diagram'] America,Asia_Oc,Other CAPTION !t('2) The three sets of heights are combined into a single',\ 'variate and a factor is set up to specify which values in the',\ 'combined variate came from which region. A schematic plot is',\ 'drawn to compare the heights in each region, with boxwidths',\ 'scaled to indicate the number of volcanoes in each region.') VARIATE [VALUES=#America,#Asia_Oc,#Other] All FACTOR [LABELS=!t(America,'Asia/Oceania',Elsewhere);\ VALUES=50(1),61(2),15(3)] Region BOXPLOT [TITLE='Schematic plot'; METHOD=schematic; BOXWIDTH=variable]\ All; GROUP=Region CAPTION !t('These examples are designed to produce high-resolution',\ 'diagrams. Lineprinter plots are obtained by setting option',\ 'GRAPHICS=line')