Fits two-straight-line (broken-stick) models to data (A.W.A. Murray & J.T. Wood).
Options
PRINT = string token |
What to print (model , summary , estimates , fittedvalues , intercepts ); default mode , summ , esti |
---|---|
PLOT = string tokens |
What to plot (breakpoint , lines , residuals ); default * i.e. nothing |
HORIZONTAL = string token |
Forces either the left- the or right-hand line to be horizontal (left , right ); default * i.e. neither |
CIPROBABILITY = scalar |
Sets the probability level of the confidence interval about the X value at the intersection; default 0.95 |
NGRIDLINES = scalar |
Controls the number of points used in the initial search for the intersection of the lines; default 100 |
TERMS = variates |
Additional x-variates to include in the model; default none |
METHOD = string token |
Optimization method (gaussnewton , newtonraphson , fletcherpowell ); default newt |
Parameters
Y = variates |
Response variates to be modelled |
---|---|
X = variates |
Explanatory variable for each response variate |
TITLE = texts |
Title to use on the graphs for each response variate |
FITTEDVALUES = variates |
Saves fitted values |
RESIDUALS = variates |
Saves standardized residuals |
ESTIMATES = variates |
Saves estimates from each model (i.e. intersection coordinates and slopes of the fitted lines) |
SE = variates |
Saves standard errors of the estimates |
INTERCEPTS = variates |
Saves the intercepts |
LOWER = scalars |
Saves the lower bound of the confidence interval about the x-value at the intersection |
UPPER = scalars |
Saves the upper bound of the confidence interval about the x-value at the intersection |
PARTIALLIKELIHOOD = pointers |
Saves the partial likelihood and grid values for partial likelihood plots |
Description
R2LINES
fits a model consisting of two straight line segments (a broken-stick or split-line model) to the data. The HORIZONTAL
option can be set to left
or right
to force either the left- or the right-hand line to be horizontal. A check is made to ensure that the overall best intersection point is used for the two lines. The NGRIDLINES
option specifies the number of extra points used i between each pair of x’s in the initial search for the best intersection point; default 30100. The METHOD
option specifies the optimization method that is then used to estimate the intersection point. The default is to use the Newton-Raphson method. (See the RCYCLE
directive for details.)
The response variate is specified by the Y
parameter, and the explanatory variate by the X
parameter. You can also use the TERMS
option to include additional x-variates in the model.
Information can be saved from the analysis by using the FITTEDVALUES
, RESIDUALS
, ESTIMATES
and SE
parameters, in the usual way. The LOWER
and UPPER
parameters can save the lower and upper values of a confidence interval for the x location of the intersection (or breakpoint) of the lines. The INTERCEPTS
parameter can save a variate containing the intercept with the y-axis and of the two lines with the x-axis. The probability for the interval is specified by the CIPROBABILITY
option, with default 0.95 (i.e. 95%).
Printed output is controlled by the PRINT
option. The settings model
, summary
and fittedvalues
operate as in ordinary regression. The estimates
setting produces the parameter estimates as usual, and also the confidence interval for the x-value of the intersection of the lines. There is also a setting intercepts
, which prints the values at which the model intercepts the x-axis and y-axis.
The PLOT
option has settings to produce the following plots:
breakpoint |
displays a partial likelihood plot, displaying the approximate F ratio for the model for a range of positions of the breakpoint between the two lines; |
---|---|
lines |
plots the fitted lines; |
residuals |
produces the four standard model-checking plots of residuals – histograms, Normal and half-Normal plots, and plots of residuals against fitted values. |
The TITLE
parameter can supply a title for the plots; the default is to use the identifier of the Y
variate. The PARTIALLIKELIHOOD
parameter can save the points used for the breakpoint plot, as a pointer storing a variate with the y-coordinates as its first element, and a variate with the x-coordinates as its second element.
Options: PRINT
, PLOT
, HORIZONTAL
, CIPROBABILITY
, NGRIDLINES
, TERMS
, METHOD
.
Parameters: Y
, X
, TITLE
, FITTEDVALUES
, RESIDUALS
, ESTIMATES
, SE
, INTERCEPTS
, LOWER
, UPPER
, PARTIALLIKELIHOOD
.
Method
A model consisting of two straight line segments is fitted by least squares. This is done by defining variables,
Slope_1 = (X - Breakpoint_X) * (X < Breakpoint_X)
Slope_2 = (X - Breakpoint_X) * (X > Breakpoint_X)
where X
is the explanatory variable, and Breakpoint_X
is the value of the explanatory variable where the two segments join. The response variable is then regressed on Slope_1
and Slope_2
. The slopes of the lines are the regression coefficients for Slope_1
and Slope_2
. If Breakpoint_X
is known, there is no problem. However, if it is not known, care is needed because the residual mean square may have local minima. If one of the straight lines is assumed to be horizontal, then only one slope is fitted and the other is set to zero.
The values of X
are sorted into increasing order, and a sequence of trial values for Breakpoint_X
is formed, consisting of the original values X
plus NGRIDLINES
-1 equally spaced values between each consecutive pair of X
‘s. The regression of Y
on Slope_1
and Slope_2
is fitted for each of these trial values. The one giving the smallest residual sum of squares is then chosen as a starting value for Breakpoint_X
, and the model is fitted as a nonlinear model using FITNONLINEAR
.
Suppose that at the true value of Breakpoint_X
the residual sum of squares is Rt
, and that at the fitted value of Breakpoint_X
the residual sum of squares is Rf
and the residual mean square is Sf
. If we assume that the observations are independently and normally distributed with common variance, the distribution of (Rt
–Rf
)/Sf
can be approximated by an F-distribution with degrees of freedom one and number of observations minus four. Hence the set of values for Breakpoint_X
for which (Rt
–Rf
)/Sf
is less than the 95th percentile of the F-distribution defines a 95% confidence region. It is possible for this region to consist of more than one distinct interval. The confidence interval will contain the minimum and maximum values of Breakpoint_X
in the region. The calculated variance ratios and the trial values of Breakpoint_X
are returned in PARTIALLIKELIHOOD
.
Action with RESTRICT
Restrictions on X
and Y
are obeyed.
See also
Directives: FITCURVE
, FITNONLINEAR
.
Commands for: Regression analysis.
Example
CAPTION 'R2LINES example'; STYLE=meta VARIATE X,Y; VALUES=\ !(-3.12,-1.74,4.36,7.27,7.90,9.05,11.01,18.51,18.96,\ 24.38,27.42,33.58,38.61,42.79,44.86,48.21,61.60,75.25),\ !(0.14,0.69,0.43,1.00,0.81,0.70,0.19,1.06,0.57,\ 3.16,1.75,12.54,1.81,5.46,7.86,10.39,22.43,39.35) R2LINES [PRINT=model,summary,estimates,fittedvalues,intercepts;\ PLOT=breakpoint,lines,residuals] Y; X & [HORIZONTAL=left] Y; X