RQSMOOTH procedure

Fits and plots quantile regressions for loess or spline models (D.B. Baird).

Options

`PRINT` = string tokens	What to print (`model`, `summary`, `fittedvalues`); default `mode`, `summ`
`PLOT` = string tokens	What to plot (`rhistogram`, `fittedvalues`); default `fitt`
`METHOD` = string token	Smoothing method (`loess`, `spline`); default `spli`
`DF` = scalar	Spline Degrees of Freedom (3-40); default 4
`KNOTS` = variate	Knot points for smoothing splines; default `*` uses equally spaced percentiles of the `X` variate
`KERNEL` = string token	What Kernel to use for Loess (`normal`, `epanechnikov`, `quadratic`, `triweight`, `tukeybiweight`, `quartic`, `linear`, `uniform`); default `norm`
`LMETHOD` = string token	Span method for Loess (`constant`, `adaptive`); default `adap`
`BANDWIDTH` = scalar	Bandwidth for smoothing between 0 and 1; default 0.4
`ORDER` = scalar	Order of local polynomial; default 1
`NGRIDPOINTS` = scalar	Number of points on smooth curve; default 100
`NBOOT` = scalar	Number of times to bootstrap data to estimate confidence limits; default 0 i.e. no bootstrapping
`SEED` = scalar	Seed for bootstrap randomization; default 0
`CIPROBABILITY` = scalar	Probability level for confidence interval; default 0.95
`TITLE` = text	Title for plots; default `*` generates titles from the structure names
`ARRANGEMENT` = string token	Whether to plot fitted regressions by the `GROUPS` parameter in a trellis plot (`single`, `trellis`); default `sing`

Parameters

`Y` = variates	Response variate
`X` = variates	Explanatory variate
`PRQUANTILES` = scalars or variates	Proportions at which to calculate quantiles; default 0.5
`GROUPS` = factors	Groups for which independent curves are fitted
`GRID` = variates	Grid of equidistant points at which the smooth is calculated
`OUTGROUPS` = factors	Groups for the fitted smoothed values saved by the `SMOOTH` parameter
`SMOOTH` = variates or pointers	Fitted smooth estimated at the `NGRIDPOINTS` points given in `GRID`
`SLOPE` = variates or pointers	Fitted slope from model for the same points as `SMOOTH`
`RESIDUALS` = variates or pointers	Residuals from regression for each quantile
`FITTEDVALUES` = variates or pointers	Fitted values from regression for each quantile
`LOWSMOOTH` = variates or pointers	Lower confidence limit of smooth for each quantile
`UPPSMOOTH` = variates or pointers	Upper confidence limit of smooth for each quantile
`SESMOOTH` = variates or pointers	Standard error of coefficients for each quantile

Description

RQSMOOTH calculates and plots a smooth quantile regression for a given dependent variate y and an explanatory variable x, specified by the Y and X parameters, respectively. You can also specify groups, by supplying a factor using the GROUPS parameter; the model is then fitted independently within each group. The type of the smooth model, either loess or spline, is specified by the METHOD option. The quantiles (between 0 and 1) for which the model is to be fitted are specified by the PRQUANTILES parameter, as a scalar is there is only one, or a variate if there are several. The default value for PRQUANTILES is 0.5, i.e. the median.

For a spline model, the number of degrees of freedom can be specified using the DF option. This must be greater or equal to 3 and less then or equal to 40. The knot points for the spline basis curves can be set using the KNOTS option. This must have DF points and no missing values. If KNOTS is not provided, the default knot points are DF equally spaced percentiles of the X variate.

For a loess model the bandwidth is set by the BANDWIDTH option, and must lie between 0 and 1; the default is 0.4. With large bandwidths the function will be smoother but less responsive, allowing for higher bias where the curve is rapidly changing. With smaller bandwidths the curve will be more responsive the curve, but the confidence limits around the curve will be larger. So the choice of bandwidth controls the trade-off between variance and bias. The loess model uses a moving window centred around the point to be predicted. The width of this window is controlled by the bandwidth and the LMETHOD option. Setting LMETHOD=constant gives a constant window width of BANDWIDTH * RANGE(X). Alternatively, setting LMETHOD=adaptive uses a varying window width, defined so that it always contains the proportion of the total points, defined by bandwidth. The window will thus be narrower where the points are denser. A local polynomial is fitted to the points in the window. The order is defined by the ORDER option as either 1 (linear) or 2 (quadratic). The points are in the polynomial regression weighted by their distance from the point that is to be predicted. The weighting function W(d) is selected using the KERNEL option, with settings:

`uniform`	W(d) = 1
`linear`	W(d) = 1 – `ABS`(d)
`quadratic`	W(d) = 1 – d²
`quartic`	W(d) = (1 – d²)²
`triweight`	W(d) = (1 – d²)³
`Normal`	W(d) = `PRNORMAL`(d)
`epanechnikov`	synonym of `quadratic`
`tukeybiweight`	synonym of `quartic`

where d is the distance within the window from the predicted point, scaled to take the values -1 and +1 at the lower and upper window edges.

Output is controlled by the PRINT option with settings:

`model`	the details of model that is being fitted;
`summary`	a summary of the fit; and
`fittedvalues`	the residuals and fitted values from the model.

The PLOT option controls what plots are displayed, with settings

`rhistogram`	histograms of residuals; and
`fittedvalues`	observed and fitted values plotted against the explanatory variate specified by the `XPLOT` option (if `XPLOT` is not set, the first expolanatory variate is used).

The ARRANGEMENT option controls whether the models for each group are displayed in a trellis plot or in a single plot with all groups together.

Bootstrapping can be used to estimate standard errors and confidence limits for the fitted values. The NBOOT option specifies the number of bootstrap samples that are taken; the default is zero, which indicates that no bootstrapping is to be done. The CIPROBABILITY option sets the size of the confidence limits. The SEED option defines the seed for the random numbers that are used to select the bootstrap samples. The default of zero continues the existing sequence of random numbers if any have already been used in the current Genstat job. If none have been used, Genstat picks a seed at random.

The results from the model fit can be saved in various parameters. They will be saved in a variate if only one quantile has been defined, or in a pointer to a set of variates (one for each quantile) if there were several. The fitted curve(s) can be saved by the SMOOTH parameter, and the slope of the fitted curve by the SLOPE parameter. The NGRIDPOINTS option controls how many points are estimated on each curve. The GRID parameter can save the positions of the points, which will be spaced equally between the minimum and maximum value of X. The UPPSMOOTH, LOWSMOOTH and SESMOOTH parameters save variates containing the bootstrap confidence limits and standard errors of the estimated curve respectively. If a GROUPS factor has been specified, the estimated values for the curves have NLEVELS(GROUPS) * NGRIDPOINTS points, with the values for group 1 being given first, followed by those for group 2, and so on. The OUTGROUPS factor can save a factor to identify the groups within the variates.

Options: PRINT, PLOT, METHOD, KERNEL, LMETHOD, BANDWIDTH, ORDER, DF, KNOTS, NGRIDPOINTS, NBOOT, SEED, CIPROBABILITY, TITLE, ARRANGEMENT.

Parameters: Y, X, PRQUANTILES, GROUPS, GRID, OUTGROUPS, SMOOTH, SLOPE, RESIDUALS, FITTEDVALUES, LOWSMOOTH, UPPSMOOTH, SESMOOTH.

Method

The FRQUANTILES directive is used to fit the quantile regression for a design matrix generated for the spline basis or a locally weighted regression about the points in the smooth. For further details of the underlying methodology, see Koenker & D’Orey (1987) or Koenker (2005).

Action with `RESTRICT`

Restrictions in the Y and X variate and GROUPS factor are combined, and only those units which are unrestricted in all structures are used in the regression.

References

Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York.

Koenker, R.W. & D’Orey, V. (1987). Algorithm AS229 computing regression quantiles. Applied Statistics, 36, 383-393.

Example

CAPTION  'RQSMOOTH example'; STYLE=meta
SPLOAD   '%GENDIR%/Examples/MelbourneTemp.gsh'
RQSMOOTH [PRINT=model,summary; PLOT=fitted; METHOD=Spline;\
         DF=6; NGRID=100; NBOOT=0] Y=MaxTemp; X=PrevMax;\
         PRQUANTILES=!(0.05,0.1,0.25,0.5,0.75,0.9,0.95)

Updated on June 18, 2019

Was this article helpful?

Yes No