Calculates quantiles of the values in a variate (P.W. Lane).
Options
PRINT = string token |
What to print (quantiles ); default quan |
---|---|
METHOD = string token |
Type of quantile to form (population , sample ); default samp |
PROPORTION = variate |
or scalar Proportions at which to calculate quantiles; default !(0,0.25,0.5,0.75,1) |
Parameters
DATA = variates |
Values whose quantiles are required; this parameter must be specified |
---|---|
QUANTILES = variates or scalars |
Identifiers of structures to store results, if required |
Description
Quantiles are statistics that characterize a distribution. The DATA
parameter supplies a sample of numbers {xi, i=1…n} from which the quantiles are to be calculated, and the METHOD
option specifies the type of quantile to form.
By default QUANTILE
calculates quantiles of the sample itself. For a proportion p in the range [0,1], the corresponding quantile q of the sample {xi} has the following properties:
1) at least the proportion p of {xi} are less than or equal to q;
2) at least the proportion (1-p) of {xi} are greater than or equal to q;
3) if q=xi and q=xi+1 satisfy 1) and 2), then take q = (xi+xi+1)/2.
Thus the sample quantile for proportion 0.5 is the median, for 0.0 it is the minimum, and for 1.0 it is the maximum of the sample.
Alternatively, you can set METHOD=population
to estimate quantiles of the underlying population from which data have been sampled. (This type of quantile is the one used most often elsewhere in Genstat.) The quantile is now an estimate of the value x such that a proportion p of the population has values less than or equal to x.
By default, QUANTILE
produces the five quantiles called the “five-number summary” of a sample, corresponding to the proportions 0.0, 0.25, 0.5, 0.75, 1.0. The option PROPORTION
can be set to a scalar or variate to request other single quantiles or sets of quantiles. By default, QUANTILE
prints the statistics, but this can be suppressed by setting option PRINT=*
. The quantiles can be stored in a variate using the parameter QUANTILES
.
Options: PRINT
, METHOD
, PROPORTION
.
Parameters: DATA
, QUANTILES
.
Method
With METHOD=sample
, QUANTILE
calculates the quantiles itself, using the SORT
and CALCULATE
directives. First, the values are sorted into ascending order. Then for each proportion, the two values that are candidates for the quantile are found, by counting from either end of the sorted list to leave the required number of values from that point in the list to the end. The quantiles are the averages of the two values found.
The alternative setting, METHOD=population
, uses the Genstat QUANTILES
function. QUANTILES
assumes that the sorted data values are evenly distributed along the range of proportions, but with the lowest data value located at proportion 1/2n, and the highest one located at proportion 1-1/2n, where n is the size of the sample. (This recognises that sample is unlikely to contain the minumum and maximum values in the population.) If the required proportion p coincides with one of these sample proportions, QUANTILES
estimates the quantile as the corresponding data value. If not, QUANTILES
finds the nearest sample point with a proportion below p, and the nearest one with a proportion above p. It then interpolates between these two points, i.e. it takes a weighted average of their data values, with weights given by the absolute difference between their proportions and p. However, if p lies outside (i.e. above or below) the sample proportions, QUANTILES
does a linear extrapolation using the two nearest sample points.
Action with RESTRICT
If the DATA
variate is restricted, the quantiles are formed only using the units that are not restricted out. The PROPORTION
and QUANTILES
variates must not be restricted.
See also
Directive: TABULATE
.
Procedure: RQLINEAR
.
Function: QUANTILES
.
Commands for: Calculations and manipulation.
Example
CAPTION 'QUANTILE example',\ !t('Generate some Normal random numbers, and print the',\ 'five-number summary (min, lower 25%, median, upper 25%, max).')\; STYLE=meta,plain CALCULATE Normal = NED(URAND(37752; 500)) QUANTILE Normal PRINT !T('Form the 10,20...90 percent quantiles,',\ 'and compare with the theoretical values.'); JUSTIFICATION=left VARIATE [VALUES=0.1,0.2...0.9] Proportn & [VALUES=-1.282,-0.8416,-0.5244,-0.2533,0,\ 0.2533,0.5244,0.8416,1.282] Theory QUANTILE [PRINT=*; PROPORTION=Proportn] Normal; QUANTILE=Sample PRINT [RLPRINT=*] Proportn,Theory,Sample; DECIMALS=4