1. Home
  2. DXYDENSITY procedure

DXYDENSITY procedure

Produces density plots for large data sets (D. B. Baird).

Options

PLOT = string tokens How to plot the density (pointplot, shadeplot, contourplot, histogram, surface); default poin
NGROUPS = scalar Number of sections into which to divide each axis (4-400); default 50
METHOD = string token Method to use to smooth the density (thinplate, radialspline, tensorspline, kernel); default * i.e. none
DF = scalar Degrees of freedom for smoothing methods (2-50); default 12
BANDWIDTH = scalar Bandwidth for kernel smoothing (0-1); default 0.2
MEANFIT = string tokens What smooth regression fits to the means to plot (yx, xy); default * i.e. none
NCONTOURS = scalar Number of contours in the contour plot; default 9
SYMBOL = string token Symbol to use in a point plot (circle, square); default circ
COLOURS = text, variate or scalar Colour to use to draw the symbols, shades, contours or surface; default !t(red, blue, black)
XTRANSFORM = string token Transformed scale for the x-axis (identity, log, log10, logit, probit, cloglog, square, exp, exp10, ilogit, iprobit, icloglog, root); default iden
YTRANSFORM = string token Transformed scale for the y-axis (identity, log, log10, logit, probit, cloglog, square, exp, exp10, ilogit, iprobit, icloglog, root); default iden
ZTRANSFORM = string token Transformed scale for the z-axis (identity, percentile, root); default iden
WINDOW = scalar Window number for the graphs; default 3
SCREEN = string token Whether to clear the screen before plotting or to continue plotting on the old screen (clear, keep, resize); default clea

Parameters

Y = variate or factor Y-coordinates of the data
X = variate or factor X-coordinates of the data
TITLE = text Title for graph; default uses the names of the data and type of plot

Description

Procedure DXYDENSITY produces a density plot of two variables, using high-resolution graphics. A density plot provides a better visual representation of the 2-dimensional spread of points than a scatter plot if there are a large number of points or many points overlap each other, and is quicker to plot. A density plot displays the number of points in small regions of the x-y plane, using various methods to plot the density.

The x and y axes are divided into equally spaced sections, to give a grid of rectangular cells covering the x-y plane. The density is calculated as the number of points that falls into each cell. The number of sections is specified by the NGROUPS option, as a scalar if the same number is required in each direction, or as a variate with two values to specify different numbers for the y-axis (first value) and the x-axis (second value). Having a large number of cells preserves more detail, but increases the time required to create and plot the graph.

The x- or y-axes can be transformed before forming the sections and calculating the density, by using the XTRANSFORM or YTRANSFORM options. The settings are the same as those of the TRANSFORM option of the XAXIS and YAXIS directives.

The PLOT option controls how the density is plotted, with settings:

    pointplot point plot , using the symbol size to indicate the number of points in each cell;
    shadeplot shade plot, using intensity of colour to indicate the number of points in each cell;
    contourplot contour plot, with contours showing the density;
    surface surface plot, with density as height;
    histogram 3-dimensional histogram of the density.

By default PLOT=pointplot.

The density can be smoothed by using the METHOD option, with settings:

    thinplate a 2-dimensional thin plate spline is fitted to the counts using the THINPLATE procedure;
    radialspline a 2-dimensional radial spline is fitted to the counts using the RADIALSPLINE procedure;
    tensorspline a 2-dimensional tensor spline is fitted to the counts using the TENSORSPLINE procedure;
    kernel a 2-dimensional kernel smoother is fitted to the counts.

By default no smoothing is done.

The DF option specifies the number of degrees of freedom for the splines (default 12); smaller values make the surface smoother, and larger values allow it to be rougher. The BANDWIDTH option specifies the band width for kernel smoothing; larger values make the surface smoother, and smaller values allow it to be rougher.

The shape of each point in a point plot is specified by the SYMBOL option, as either a circle (default) or square. The COLOURS option specifies the colours that are used, in a scalar or a text or variate with up to three values. For a line plot, the first value specifies the colour for the points, and the second and third values define the colours for any lines fitted by the MEANFIT option. For a histogram, the first value of COLOURS defines the colour of the bars. For shade, contour and surface plot, if COLOURS has two or more values, the first is used for high densities, the second is used for low densities, and intermediate densities are plotted in the corresponding intermediate colour; if COLOURS has only one value, the low densities are plotted in white. If COLOURS has three values, the third is used for the contours of contour and surface plots.

The scaling of densities is controlled by the ZTRANSFORM option with settings:

    identity no scaling (default),
    root takes the square root of the densities, giving more emphasis to low counts,
    percentile takes a rank transform and plots these, so that percentiles are equally spaced.

The MEANFIT option allows you can to add a smoothing spline regression of y on x or of x on y to a point plot. The available settings are

    yx for a regression of y on x, and
    xy for a regression of x on y.

The DF option again specifies the number of degrees of freedom for the spline (default 12). By default neither are done.

The Y and X parameters specify the y- and x-coordinates of the data values, in either variates or factors. Their identifiers are used for the titles of the axes at the lower and left-hand edges of the graphics frame (i.e. page). You can also use the TITLE parameter to supply an overall title for the plot.

The WINDOW options specifies the number of the window to use for the plot, and the SCREEN option controls whether the screen is cleared first, as usual (see e.g. DGRAPH).

Options: PLOT, NGROUPS, XTRANSFORM, YTRANSFORM, ZTRANSFORM, METHOD, MEANFIT, DF, BANDWIDTH, NCONTOURS, COLOURS, SYMBOL, WINDOW, SCREEN.

Parameters: Y, X, TITLE.

Action with RESTRICT

If any of the variates or factors are restricted, only the units not excluded by the restriction will be plotted.

See also

Directive: DCONTOUR, DGRAPH, DSHADE, D3GRAPH.

Commands for: Graphics.

Example

CAPTION      'DXYDENSITY example','Density plots of microarray data';\
             STYLE=meta,plain
ENQUIRE      CHANNEL=-1; EXIST=check; NAME=\
             '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'
IF check
  SPLOAD     [PRINT=*] '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'; ISAVE=data
  RESTRICT   data[]; Intensity > 1
ELSE
  CAPTION 'Microarray data not installed, using artifical data'
  SCALAR N; VALUE=60000
  CALC [SEED=531] Intensity = (X = 10*!(1...N)/N) + GRNORMAL(N;2;0.1)
  CALC logRatio  = BOUND(LOG(1+X-(X/8)**2) + GRNORMAL(N;-1.6;0.2/X**1.5);-3;3)
ENDIF
DXYDENSITY [PLOT=POINT; MEANFIT=xy,yx] Y=logRatio; X=Intensity
DXYDENSITY [PLOT=SHADE; NGROUPS=60; ZTRANSFORM=percentile;\
           COLOUR=!t(red,blue)] Y=logRatio; X=Intensity
DXYDENSITY [PLOT=histogram; ZTRANSFORM=root; NGROUPS=30; COLOUR='lightblue']\
           Y=logRatio; X=Intensity
DXYDENSITY [PLOT=surface; METHOD=kernel; BANDWIDTH=0.1]\
           Y=logRatio; X=Intensity
Updated on February 17, 2022

Was this article helpful?