Fits and plots quantile regressions for loess or spline models (D.B. Baird).
Options
PRINT = string tokens |
What to print (model , summary , fittedvalues ); default mode , summ |
---|---|
PLOT = string tokens |
What to plot (rhistogram , fittedvalues ); default fitt |
METHOD = string token |
Smoothing method (loess , spline ); default spli |
DF = scalar |
Spline Degrees of Freedom (3-40); default 4 |
KNOTS = variate |
Knot points for smoothing splines; default * uses equally spaced percentiles of the X variate |
KERNEL = string token |
What Kernel to use for Loess (normal , epanechnikov , quadratic , triweight , tukeybiweight , quartic , linear , uniform ); default norm |
LMETHOD = string token |
Span method for Loess (constant , adaptive ); default adap |
BANDWIDTH = scalar |
Bandwidth for smoothing between 0 and 1; default 0.4 |
ORDER = scalar |
Order of local polynomial; default 1 |
NGRIDPOINTS = scalar |
Number of points on smooth curve; default 100 |
NBOOT = scalar |
Number of times to bootstrap data to estimate confidence limits; default 0 i.e. no bootstrapping |
SEED = scalar |
Seed for bootstrap randomization; default 0 |
CIPROBABILITY = scalar |
Probability level for confidence interval; default 0.95 |
TITLE = text |
Title for plots; default * generates titles from the structure names |
ARRANGEMENT = string token |
Whether to plot fitted regressions by the GROUPS parameter in a trellis plot (single , trellis ); default sing |
Parameters
Y = variates |
Response variate |
---|---|
X = variates |
Explanatory variate |
PRQUANTILES = scalars or variates |
Proportions at which to calculate quantiles; default 0.5 |
GROUPS = factors |
Groups for which independent curves are fitted |
GRID = variates |
Grid of equidistant points at which the smooth is calculated |
OUTGROUPS = factors |
Groups for the fitted smoothed values saved by the SMOOTH parameter |
SMOOTH = variates or pointers |
Fitted smooth estimated at the NGRIDPOINTS points given in GRID |
SLOPE = variates or pointers |
Fitted slope from model for the same points as SMOOTH |
RESIDUALS = variates or pointers |
Residuals from regression for each quantile |
FITTEDVALUES = variates or pointers |
Fitted values from regression for each quantile |
LOWSMOOTH = variates or pointers |
Lower confidence limit of smooth for each quantile |
UPPSMOOTH = variates or pointers |
Upper confidence limit of smooth for each quantile |
SESMOOTH = variates or pointers |
Standard error of coefficients for each quantile |
Description
RQSMOOTH
calculates and plots a smooth quantile regression for a given dependent variate y and an explanatory variable x, specified by the Y
and X
parameters, respectively. You can also specify groups, by supplying a factor using the GROUPS
parameter; the model is then fitted independently within each group. The type of the smooth model, either loess or spline, is specified by the METHOD
option. The quantiles (between 0 and 1) for which the model is to be fitted are specified by the PRQUANTILES
parameter, as a scalar is there is only one, or a variate if there are several. The default value for PRQUANTILES
is 0.5, i.e. the median.
For a spline model, the number of degrees of freedom can be specified using the DF
option. This must be greater or equal to 3 and less then or equal to 40. The knot points for the spline basis curves can be set using the KNOTS
option. This must have DF
points and no missing values. If KNOTS
is not provided, the default knot points are DF
equally spaced percentiles of the X
variate.
For a loess model the bandwidth is set by the BANDWIDTH
option, and must lie between 0 and 1; the default is 0.4. With large bandwidths the function will be smoother but less responsive, allowing for higher bias where the curve is rapidly changing. With smaller bandwidths the curve will be more responsive the curve, but the confidence limits around the curve will be larger. So the choice of bandwidth controls the trade-off between variance and bias. The loess model uses a moving window centred around the point to be predicted. The width of this window is controlled by the bandwidth and the LMETHOD
option. Setting LMETHOD=constant
gives a constant window width of BANDWIDTH
*
RANGE(X)
. Alternatively, setting LMETHOD=adaptive
uses a varying window width, defined so that it always contains the proportion of the total points, defined by bandwidth
. The window will thus be narrower where the points are denser. A local polynomial is fitted to the points in the window. The order is defined by the ORDER
option as either 1 (linear) or 2 (quadratic). The points are in the polynomial regression weighted by their distance from the point that is to be predicted. The weighting function W(d) is selected using the KERNEL
option, with settings:
uniform |
W(d) = 1 |
---|---|
linear |
W(d) = 1 – ABS (d) |
quadratic |
W(d) = 1 – d2 |
quartic |
W(d) = (1 – d2)2 |
triweight |
W(d) = (1 – d2)3 |
Normal |
W(d) = PRNORMAL (d) |
epanechnikov |
synonym of quadratic |
tukeybiweight |
synonym of quartic |
where d is the distance within the window from the predicted point, scaled to take the values -1 and +1 at the lower and upper window edges.
Output is controlled by the PRINT
option with settings:
model |
the details of model that is being fitted; |
---|---|
summary |
a summary of the fit; and |
fittedvalues |
the residuals and fitted values from the model. |
The PLOT
option controls what plots are displayed, with settings
rhistogram |
histograms of residuals; and |
---|---|
fittedvalues |
observed and fitted values plotted against the explanatory variate specified by the XPLOT option (if XPLOT is not set, the first expolanatory variate is used). |
The ARRANGEMENT
option controls whether the models for each group are displayed in a trellis plot or in a single plot with all groups together.
Bootstrapping can be used to estimate standard errors and confidence limits for the fitted values. The NBOOT
option specifies the number of bootstrap samples that are taken; the default is zero, which indicates that no bootstrapping is to be done. The CIPROBABILITY
option sets the size of the confidence limits. The SEED
option defines the seed for the random numbers that are used to select the bootstrap samples. The default of zero continues the existing sequence of random numbers if any have already been used in the current Genstat job. If none have been used, Genstat picks a seed at random.
The results from the model fit can be saved in various parameters. They will be saved in a variate if only one quantile has been defined, or in a pointer to a set of variates (one for each quantile) if there were several. The fitted curve(s) can be saved by the SMOOTH
parameter, and the slope of the fitted curve by the SLOPE
parameter. The NGRIDPOINTS
option controls how many points are estimated on each curve. The GRID
parameter can save the positions of the points, which will be spaced equally between the minimum and maximum value of X
. The UPPSMOOTH
, LOWSMOOTH
and SESMOOTH
parameters save variates containing the bootstrap confidence limits and standard errors of the estimated curve respectively. If a GROUPS
factor has been specified, the estimated values for the curves have NLEVELS(GROUPS)
*
NGRIDPOINTS
points, with the values for group 1 being given first, followed by those for group 2, and so on. The OUTGROUPS
factor can save a factor to identify the groups within the variates.
Options: PRINT
, PLOT
, METHOD
, KERNEL
, LMETHOD
, BANDWIDTH
, ORDER
, DF
, KNOTS
, NGRIDPOINTS
, NBOOT
, SEED
, CIPROBABILITY
, TITLE
, ARRANGEMENT
.
Parameters: Y
, X
, PRQUANTILES
, GROUPS
, GRID
, OUTGROUPS
, SMOOTH
, SLOPE
, RESIDUALS
, FITTEDVALUES
, LOWSMOOTH
, UPPSMOOTH
, SESMOOTH
.
Method
The FRQUANTILES
directive is used to fit the quantile regression for a design matrix generated for the spline basis or a locally weighted regression about the points in the smooth. For further details of the underlying methodology, see Koenker & D’Orey (1987) or Koenker (2005).
Action with RESTRICT
Restrictions in the Y
and X
variate and GROUPS
factor are combined, and only those units which are unrestricted in all structures are used in the regression.
References
Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York.
Koenker, R.W. & D’Orey, V. (1987). Algorithm AS229 computing regression quantiles. Applied Statistics, 36, 383-393.
See also
Directive: FRQUANTILES
.
Procedures: RQLINEAR
, RQNONLINEAR
.
Commands for: Regression analysis.
Example
CAPTION 'RQSMOOTH example'; STYLE=meta SPLOAD '%GENDIR%/Examples/MelbourneTemp.gsh' RQSMOOTH [PRINT=model,summary; PLOT=fitted; METHOD=Spline;\ DF=6; NGRID=100; NBOOT=0] Y=MaxTemp; X=PrevMax;\ PRQUANTILES=!(0.05,0.1,0.25,0.5,0.75,0.9,0.95)