Produces density plots for large data sets (D. B. Baird).

### Options

`PLOT` = string tokens |
How to plot the density (`pointplot` , `shadeplot` , `contourplot` , `histogram` , `surface` ); default `poin` |
---|---|

`NGROUPS` = scalar |
Number of sections into which to divide each axis (4-400); default 50 |

`METHOD` = string token |
Method to use to smooth the density (`thinplate` , `radialspline` , `tensorspline` , `kernel` ); default `*` i.e. none |

`DF` = scalar |
Degrees of freedom for smoothing methods (2-50); default 12 |

`BANDWIDTH` = scalar |
Bandwidth for kernel smoothing (0-1); default 0.2 |

`MEANFIT` = string tokens |
What smooth regression fits to the means to plot (`yx` , `xy` ); default `*` i.e. none |

`NCONTOURS` = scalar |
Number of contours in the contour plot; default 9 |

`SYMBOL` = string token |
Symbol to use in a point plot (`circle` , `square` ); default `circ` |

`COLOURS` = text, variate or scalar |
Colour to use to draw the symbols, shades, contours or surface; default `!t(red, blue, black)` |

`XTRANSFORM` = string token |
Transformed scale for the x-axis (`identity` , `log` , `log10` , `logit` , `probit` , `cloglog` , `square` , `exp` , `exp10` , `ilogit` , `iprobit` , `icloglog` , `root` ); default `iden` |

`YTRANSFORM` = string token |
Transformed scale for the y-axis (`identity` , `log` , `log10` , `logit` , `probit` , `cloglog` , `square` , `exp` , `exp10` , `ilogit` , `iprobit` , `icloglog` , `root` ); default `iden` |

`ZTRANSFORM` = string token |
Transformed scale for the z-axis (`identity` , `percentile` , `root` ); default `iden` |

`WINDOW` = scalar |
Window number for the graphs; default 3 |

`SCREEN` = string token |
Whether to clear the screen before plotting or to continue plotting on the old screen (`clear` , `keep` , `resize` ); default `clea` |

### Parameters

`Y` = variate or factor |
Y-coordinates of the data |
---|---|

`X` = variate or factor |
X-coordinates of the data |

`TITLE` = text |
Title for graph; default uses the names of the data and type of plot |

### Description

Procedure `DXYDENSITY`

produces a density plot of two variables, using high-resolution graphics. A density plot provides a better visual representation of the 2-dimensional spread of points than a scatter plot if there are a large number of points or many points overlap each other, and is quicker to plot. A density plot displays the number of points in small regions of the x-y plane, using various methods to plot the density.

The x and y axes are divided into equally spaced sections, to give a grid of rectangular cells covering the x-y plane. The density is calculated as the number of points that falls into each cell. The number of sections is specified by the `NGROUPS`

option, as a scalar if the same number is required in each direction, or as a variate with two values to specify different numbers for the y-axis (first value) and the x-axis (second value). Having a large number of cells preserves more detail, but increases the time required to create and plot the graph.

The x- or y-axes can be transformed before forming the sections and calculating the density, by using the `XTRANSFORM`

or `YTRANSFORM`

options. The settings are the same as those of the `TRANSFORM`

option of the `XAXIS`

and `YAXIS`

directives.

The `PLOT`

option controls how the density is plotted, with settings:

`pointplot` |
point plot , using the symbol size to indicate the number of points in each cell; |
---|---|

`shadeplot` |
shade plot, using intensity of colour to indicate the number of points in each cell; |

`contourplot` |
contour plot, with contours showing the density; |

`surface` |
surface plot, with density as height; |

`histogram` |
3-dimensional histogram of the density. |

By default `PLOT=pointplot`

.

The density can be smoothed by using the `METHOD`

option, with settings:

`thinplate` |
a 2-dimensional thin plate spline is fitted to the counts using the `THINPLATE` procedure; |
---|---|

`radialspline` |
a 2-dimensional radial spline is fitted to the counts using the `RADIALSPLINE` procedure; |

`tensorspline` |
a 2-dimensional tensor spline is fitted to the counts using the `TENSORSPLINE` procedure; |

`kernel` |
a 2-dimensional kernel smoother is fitted to the counts. |

By default no smoothing is done.

The `DF`

option specifies the number of degrees of freedom for the splines (default 12); smaller values make the surface smoother, and larger values allow it to be rougher. The `BANDWIDTH`

option specifies the band width for kernel smoothing; larger values make the surface smoother, and smaller values allow it to be rougher.

The shape of each point in a point plot is specified by the `SYMBOL`

option, as either a circle (default) or square. The `COLOURS`

option specifies the colours that are used, in a scalar or a text or variate with up to three values. For a line plot, the first value specifies the colour for the points, and the second and third values define the colours for any lines fitted by the `MEANFIT`

option. For a histogram, the first value of `COLOURS`

defines the colour of the bars. For shade, contour and surface plot, if `COLOURS`

has two or more values, the first is used for high densities, the second is used for low densities, and intermediate densities are plotted in the corresponding intermediate colour; if `COLOURS`

has only one value, the low densities are plotted in white. If `COLOURS`

has three values, the third is used for the contours of contour and surface plots.

The scaling of densities is controlled by the `ZTRANSFORM`

option with settings:

`identity` |
no scaling (default), |
---|---|

`root` |
takes the square root of the densities, giving more emphasis to low counts, |

`percentile` |
takes a rank transform and plots these, so that percentiles are equally spaced. |

The `MEANFIT`

option allows you can to add a smoothing spline regression of y on x or of x on y to a point plot. The available settings are

`yx` |
for a regression of y on x, and |
---|---|

`xy` |
for a regression of x on y. |

The `DF`

option again specifies the number of degrees of freedom for the spline (default 12). By default neither are done.

The `Y`

and `X`

parameters specify the y- and x-coordinates of the data values, in either variates or factors. Their identifiers are used for the titles of the axes at the lower and left-hand edges of the graphics frame (i.e. page). You can also use the `TITLE`

parameter to supply an overall title for the plot.

The `WINDOW`

options specifies the number of the window to use for the plot, and the `SCREEN`

option controls whether the screen is cleared first, as usual (see e.g. `DGRAPH`

).

Options: `PLOT`

, `NGROUPS`

, `XTRANSFORM`

, `YTRANSFORM`

, `ZTRANSFORM`

, `METHOD`

, `MEANFIT`

, `DF`

, `BANDWIDTH`

, `NCONTOURS`

, `COLOURS`

, `SYMBOL`

, `WINDOW`

, `SCREEN`

.

Parameters: `Y`

, `X`

, `TITLE`

.

### Action with `RESTRICT`

If any of the variates or factors are restricted, only the units not excluded by the restriction will be plotted.

### See also

Directive: `DCONTOUR`

, `DGRAPH`

, `DSHADE`

, `D3GRAPH`

.

Commands for: Graphics.

### Example

CAPTION 'DXYDENSITY example','Density plots of microarray data';\ STYLE=meta,plain ENQUIRE CHANNEL=-1; EXIST=check; NAME=\ '%GENDIR%/Data/Microarrays/Data13-6-9.gwb' IF check SPLOAD [PRINT=*] '%GENDIR%/Data/Microarrays/Data13-6-9.gwb'; ISAVE=data RESTRICT data[]; Intensity > 1 ELSE CAPTION 'Microarray data not installed, using artifical data' SCALAR N; VALUE=60000 CALC [SEED=531] Intensity = (X = 10*!(1...N)/N) + GRNORMAL(N;2;0.1) CALC logRatio = BOUND(LOG(1+X-(X/8)**2) + GRNORMAL(N;-1.6;0.2/X**1.5);-3;3) ENDIF DXYDENSITY [PLOT=POINT; MEANFIT=xy,yx] Y=logRatio; X=Intensity DXYDENSITY [PLOT=SHADE; NGROUPS=60; ZTRANSFORM=percentile;\ COLOUR=!t(red,blue)] Y=logRatio; X=Intensity DXYDENSITY [PLOT=histogram; ZTRANSFORM=root; NGROUPS=30; COLOUR='lightblue']\ Y=logRatio; X=Intensity DXYDENSITY [PLOT=surface; METHOD=kernel; BANDWIDTH=0.1]\ Y=logRatio; X=Intensity