1. Home
  2. RQSMOOTH procedure

RQSMOOTH procedure

Fits and plots quantile regressions for loess or spline models (D.B. Baird).

Options

PRINT = string tokens What to print (model, summary, fittedvalues); default mode, summ
PLOT = string tokens What to plot (rhistogram, fittedvalues); default fitt
METHOD = string token Smoothing method (loess, spline); default spli
DF = scalar Spline Degrees of Freedom (3-40); default 4
KNOTS = variate Knot points for smoothing splines; default * uses equally spaced percentiles of the X variate
KERNEL = string token What Kernel to use for Loess (normal, epanechnikov, quadratic, triweight, tukeybiweight, quartic, linear, uniform); default norm
LMETHOD = string token Span method for Loess (constant, adaptive); default adap
BANDWIDTH = scalar Bandwidth for smoothing between 0 and 1; default 0.4
ORDER = scalar Order of local polynomial; default 1
NGRIDPOINTS = scalar Number of points on smooth curve; default 100
NBOOT = scalar Number of times to bootstrap data to estimate confidence limits; default 0 i.e. no bootstrapping
SEED = scalar Seed for bootstrap randomization; default 0
CIPROBABILITY = scalar Probability level for confidence interval; default 0.95
TITLE = text Title for plots; default * generates titles from the structure names
ARRANGEMENT = string token Whether to plot fitted regressions by the GROUPS parameter in a trellis plot (single, trellis); default sing

Parameters

Y = variates Response variate
X = variates Explanatory variate
PRQUANTILES = scalars or variates Proportions at which to calculate quantiles; default 0.5
GROUPS = factors Groups for which independent curves are fitted
GRID = variates Grid of equidistant points at which the smooth is calculated
OUTGROUPS = factors Groups for the fitted smoothed values saved by the SMOOTH parameter
SMOOTH = variates or pointers Fitted smooth estimated at the NGRIDPOINTS points given in GRID
SLOPE = variates or pointers Fitted slope from model for the same points as SMOOTH
RESIDUALS = variates or pointers Residuals from regression for each quantile
FITTEDVALUES = variates or pointers Fitted values from regression for each quantile
LOWSMOOTH = variates or pointers Lower confidence limit of smooth for each quantile
UPPSMOOTH = variates or pointers Upper confidence limit of smooth for each quantile
SESMOOTH = variates or pointers Standard error of coefficients for each quantile

Description

RQSMOOTH calculates and plots a smooth quantile regression for a given dependent variate y and an explanatory variable x, specified by the Y and X parameters, respectively. You can also specify groups, by supplying a factor using the GROUPS parameter; the model is then fitted independently within each group. The type of the smooth model, either loess or spline, is specified by the METHOD option. The quantiles (between 0 and 1) for which the model is to be fitted are specified by the PRQUANTILES parameter, as a scalar is there is only one, or a variate if there are several. The default value for PRQUANTILES is 0.5, i.e. the median.

For a spline model, the number of degrees of freedom can be specified using the DF option. This must be greater or equal to 3 and less then or equal to 40. The knot points for the spline basis curves can be set using the KNOTS option. This must have DF points and no missing values. If KNOTS is not provided, the default knot points are DF equally spaced percentiles of the X variate.

For a loess model the bandwidth is set by the BANDWIDTH option, and must lie between 0 and 1; the default is 0.4. With large bandwidths the function will be smoother but less responsive, allowing for higher bias where the curve is rapidly changing. With smaller bandwidths the curve will be more responsive the curve, but the confidence limits around the curve will be larger. So the choice of bandwidth controls the trade-off between variance and bias. The loess model uses a moving window centred around the point to be predicted. The width of this window is controlled by the bandwidth and the LMETHOD option. Setting LMETHOD=constant gives a constant window width of BANDWIDTH * RANGE(X). Alternatively, setting LMETHOD=adaptive uses a varying window width, defined so that it always contains the proportion of the total points, defined by bandwidth. The window will thus be narrower where the points are denser. A local polynomial is fitted to the points in the window. The order is defined by the ORDER option as either 1 (linear) or 2 (quadratic). The points are in the polynomial regression weighted by their distance from the point that is to be predicted. The weighting function W(d) is selected using the KERNEL option, with settings:

    uniform W(d) = 1
    linear W(d) = 1 – ABS(d)
    quadratic W(d) = 1 – d2
    quartic W(d) = (1 – d2)2
    triweight W(d) = (1 – d2)3
    Normal W(d) = PRNORMAL(d)
    epanechnikov synonym of quadratic
    tukeybiweight synonym of quartic

where d is the distance within the window from the predicted point, scaled to take the values -1 and +1 at the lower and upper window edges.

Output is controlled by the PRINT option with settings:

    model the details of model that is being fitted;
    summary a summary of the fit; and
    fittedvalues the residuals and fitted values from the model.

The PLOT option controls what plots are displayed, with settings

    rhistogram histograms of residuals; and
    fittedvalues observed and fitted values plotted against the explanatory variate specified by the XPLOT option (if XPLOT is not set, the first expolanatory variate is used).

The ARRANGEMENT option controls whether the models for each group are displayed in a trellis plot or in a single plot with all groups together.

Bootstrapping can be used to estimate standard errors and confidence limits for the fitted values. The NBOOT option specifies the number of bootstrap samples that are taken; the default is zero, which indicates that no bootstrapping is to be done. The CIPROBABILITY option sets the size of the confidence limits. The SEED option defines the seed for the random numbers that are used to select the bootstrap samples. The default of zero continues the existing sequence of random numbers if any have already been used in the current Genstat job. If none have been used, Genstat picks a seed at random.

The results from the model fit can be saved in various parameters. They will be saved in a variate if only one quantile has been defined, or in a pointer to a set of variates (one for each quantile) if there were several. The fitted curve(s) can be saved by the SMOOTH parameter, and the slope of the fitted curve by the SLOPE parameter. The NGRIDPOINTS option controls how many points are estimated on each curve. The GRID parameter can save the positions of the points, which will be spaced equally between the minimum and maximum value of X. The UPPSMOOTH, LOWSMOOTH and SESMOOTH parameters save variates containing the bootstrap confidence limits and standard errors of the estimated curve respectively. If a GROUPS factor has been specified, the estimated values for the curves have NLEVELS(GROUPS) * NGRIDPOINTS points, with the values for group 1 being given first, followed by those for group 2, and so on. The OUTGROUPS factor can save a factor to identify the groups within the variates.

Options: PRINT, PLOT, METHOD, KERNEL, LMETHOD, BANDWIDTH, ORDER, DF, KNOTS, NGRIDPOINTS, NBOOT, SEED, CIPROBABILITY, TITLE, ARRANGEMENT.

Parameters: Y, X, PRQUANTILES, GROUPS, GRID, OUTGROUPS, SMOOTH, SLOPE, RESIDUALS, FITTEDVALUES, LOWSMOOTH, UPPSMOOTH, SESMOOTH.

Method

The FRQUANTILES directive is used to fit the quantile regression for a design matrix generated for the spline basis or a locally weighted regression about the points in the smooth. For further details of the underlying methodology, see Koenker & D’Orey (1987) or Koenker (2005).

Action with RESTRICT

Restrictions in the Y and X variate and GROUPS factor are combined, and only those units which are unrestricted in all structures are used in the regression.

References

Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York.

Koenker, R.W. & D’Orey, V. (1987). Algorithm AS229 computing regression quantiles. Applied Statistics, 36, 383-393.

See also

Directive: FRQUANTILES.

Procedures: RQLINEAR, RQNONLINEAR.

Commands for: Regression analysis.

Example

CAPTION  'RQSMOOTH example'; STYLE=meta
SPLOAD   '%GENDIR%/Examples/MelbourneTemp.gsh'
RQSMOOTH [PRINT=model,summary; PLOT=fitted; METHOD=Spline;\
         DF=6; NGRID=100; NBOOT=0] Y=MaxTemp; X=PrevMax;\
         PRQUANTILES=!(0.05,0.1,0.25,0.5,0.75,0.9,0.95)
Updated on June 18, 2019

Was this article helpful?