1. Home
  2. DMOSAIC procedure

DMOSAIC procedure

Produces a mosaic plot to display a table of counts (D.B. Baird).

Options

LINECOLOUR = text or scalar Colour to use for the outlines of the boxes; default 'black'
EMPTYCOLOUR = text or scalar Colour to use for the outlines of the empty boxes; default 'purple'
THICKNESS = scalar Line thickness for the outlines of the boxes; default 1
LABELSIZE = scalar Label size for the axis labels; default 1
GAP = scalar Relative size of the gaps between boxes; default 1
MINSIZE = scalar Minimum row/column dimension for a box; default 0.002

Parameters

DATA = tables or pointers Data to be plotted
ROWFACTORS = pointers Factors to be displayed down the window; if COLFACTORS is not specified, the default is to display the factors in the second half of the classification set of the table, otherwise it is the classifying factors not included in COLFACTORS
COLFACTORS = pointers Factors to be displayed across the window; if ROWFACTORS is not specified, the default is to display the factors in the first half of the classification set of the table, otherwise it is the classifying factors not included in ROWFACTORS
TITLE = texts Title for the plot; default * i.e. none
COLOURS = variate or text The colours to shade the boxes; by default the colours are taken from the pens 2 onwards, with a final colour of white
LABELWIDTH = scalars or variates Maximum length of the labels to display for each factor; default * uses the full text of the factor labels
WINDOW = scalar Window number for the graph; default 3
SCREEN = string token Whether to clear the screen before plotting or to or continue plotting on the existing screen (clear, keep); default clea

Description

DMOSAIC produces a mosaic plot (Friendly 1994) of a table of counts. The DATA parameter supplies the data to plot, as either a table or a pointer to a set of factors that are then tabulated to create a table of counts. The display takes the form of a set of boxes arranged in rows and columns. The size of each box reflects the proportion of the observations that fall into the corresponding cell of the table. The boxes are coloured by the levels of the final factor, to represent the changing proportions of this factor within the other factors.

The ROWFACTORS and COLFACTORS parameters can supply pointers containing the factors to be displayed down and across the window, respectively. If both are defined, then together they must contain all the factors in the table. If just one is defined, the other is formed from the remaining factors in the table in the order given in the CLASSIFICATION of the table or the DATA pointer. If neither is specified, the first half of the factors (rounded down to the integer below in the case of an odd number) are assigned to columns and remainder to rows, in the order defined by the CLASSIFICATION of the table or by the DATA pointer. Changing the contents of the factors to be displayed across and down the screen, or their ordering, can give a very different view of the data, especially the choice for last factor displayed across the screen. This is used to colour the boxes, as explained below.

The width of each box is determined from the relative proportion of the observations that fall into the corresponding column-factor combinations. Within each column, the heights of the boxes are proportional to the number of observations in the corresponding cells of the table.

There are gaps between the rows and columns. These are largest between the levels of the first row or column factor, smaller between the levels of the second row or column factor, continuing to shrink until the smallest gaps are between the levels of the final row or column factor. The sizes of the gaps can all be made larger or smaller by the GAP option. Setting a value of zero gives no gaps between boxes while, for example, two doubles the sizes of the gaps.

The outline of each box is drawn with a pen, whose colour and thickness are specified by the LINECOLOUR and THICKNESS options, with defaults of 'black' and 1 respectively. The MINSIZE option puts a lower limit on the dimensions of the boxes in both row and column directions. The default of 0.002 prevents boxes with little or no counts from being lost to the eye. Empty boxes (i.e. those with no observations) have their outlines drawn in the colour specified by the EMPTYCOLOUR option (default 'purple') and have no fill. The other boxes are coloured according to the levels of the final column factor. The colours for each of its levels can be specified by the COLOURS parameter, as either a text with colour names (e.g. !t('red', 'blue')) or a variate containing RGB colours. By default, the colours for all but the last level are taken from the colours assigned to pens 2 onwards, with white assigned to the last level.

The labels of the last column factor are displayed in the lower x-margin, and the rest in the upper x-margin. The labels of the last row factor are displayed in the lower y-margin, and the rest in the upper y-margin. If there are many factors or long labels, it can be difficult to fit all the labels on the axes. If so, you can use the LABELSIZE option to change the size of the labels, and the LABELWIDTH parameter to truncate the labels at a maximum width. LABELWIDTH can be set to a variate defining the maximum width for each factor (with factors in the order defined by the COLFACTORS and then the ROWFACTORS parameters), or a scalar to apply the same maximum width to all the factors.

The WINDOW parameter defines the window to be used for the plot (default 3), and the SCREEN parameter controls whether or not the screen is cleared before plotting (default clear). To display multiple plots on the same screen, you should set SCREEN=keep for the second and subsequent plot. You can use the FRAME directive or the FFRAME procedure to specify the numbers and locations of the windows if the Genstat defaults are unsuitable. The TITLE parameter can be used to specify a title for the plot; by default there is none.

Options: LINECOLOUR, EMPTYCOLOUR, THICKNESS, LABELSIZE, GAP, MINSIZE.
Parameters: DATA, ROWFACTORS, COLFACTORS, TITLE, COLOURS, LABELWIDTH, WINDOW, SCREEN. Method The proportions of observations falling into the row classes are calculated and these form the x dimensions of the boxes. Within each row combination, the proportions of observations falling into the column classes are calculated and these form the box heights. The gaps are added to the box positions and minimum dimensions enforced. Any empty boxes are drawn with the empty pen. The labels are applied to the four sides of the plot, and label positions are adjusted to avoid overlap by a subsidiary procedure _DLABELSPACE.

Action with RESTRICT

DMOSAIC will obey restrictions on the factors in a DATA pointer.

Reference

Friendly, M. (1994). Mosaic displays for multi-way contingency tables, Journal of the American Statistical Association89, 190–200.

See also

Directives: BARCHARTDHISTOGRAM.
Procedures: BOXPLOTDTABLE.
Commands for: Graphics.

Example

CAPTION  'DMOSAIC Example',\
         'Survival of souls on the Titanic by class, age and sex';\
         STYLE=major,minor
SPLOAD   [PRINT=*] '%DATA%/Titanic.gsh'; ISAVE=pData

"Class    - passenager class: Crew, First, Second, Third
 Age      - Adult or Child
 Sex      - Male or Female
 Survival - No or Yes"

TABULATE [CLASS=Class,Age,Sex,Survived; COUNTS=tcounts; PRINT=counts]
DMOSAIC  tcounts; TITLE='Survival of Titanic sinking by class, age and sex'

CAPTION  'Results from a detergent preference test'; STYLE=minor
SPLOAD   [PRINT=*] '%DATA%/Detergent.gsh'

"AUser         - No or Yes - whether they had used product A previously
 Temperature   - Low or High - temperature used during the test
 WaterSoftness - Soft, Medium and Hard
 Preference    - A or B indicates preference for product A or B"

TABULATE [CLASS=AUser,Temperature,WaterSoftness,Preference; \
         COUNTS=prefer; PRINT=counts]
DMOSAIC  prefer; TITLE='Consumer preference for detergent A vs B'
Updated on March 24, 2023

Was this article helpful?