R2LINES procedure

Fits two-straight-line (broken-stick) models to data (A.W.A. Murray & J.T. Wood).

Options

`PRINT` = string token	What to print (`model`, `summary`, `estimates`, `fittedvalues`, `intercepts`); default `mode`, `summ`, `esti`
`PLOT` = string tokens	What to plot (`breakpoint`, `lines`, `residuals`); default `*` i.e. nothing
`HORIZONTAL` = string token	Forces either the left- the or right-hand line to be horizontal (`left`, `right`); default `*` i.e. neither
`CIPROBABILITY` = scalar	Sets the probability level of the confidence interval about the `X` value at the intersection; default 0.95
`NGRIDLINES` = scalar	Controls the number of points used in the initial search for the intersection of the lines; default 100
`TERMS` = variates	Additional x-variates to include in the model; default none
`METHOD` = string token	Optimization method (`gaussnewton`, `newtonraphson`, `fletcherpowell`); default `newt`

Parameters

`Y` = variates	Response variates to be modelled
`X` = variates	Explanatory variable for each response variate
`TITLE` = texts	Title to use on the graphs for each response variate
`FITTEDVALUES` = variates	Saves fitted values
`RESIDUALS` = variates	Saves standardized residuals
`ESTIMATES` = variates	Saves estimates from each model (i.e. intersection coordinates and slopes of the fitted lines)
`SE` = variates	Saves standard errors of the estimates
`INTERCEPTS` = variates	Saves the intercepts
`LOWER` = scalars	Saves the lower bound of the confidence interval about the x-value at the intersection
`UPPER` = scalars	Saves the upper bound of the confidence interval about the x-value at the intersection
`PARTIALLIKELIHOOD` = pointers	Saves the partial likelihood and grid values for partial likelihood plots

Description

R2LINES fits a model consisting of two straight line segments (a broken-stick or split-line model) to the data. The HORIZONTAL option can be set to left or right to force either the left- or the right-hand line to be horizontal. A check is made to ensure that the overall best intersection point is used for the two lines. The NGRIDLINES option specifies the number of extra points used i between each pair of x’s in the initial search for the best intersection point; default 30100. The METHOD option specifies the optimization method that is then used to estimate the intersection point. The default is to use the Newton-Raphson method. (See the RCYCLE directive for details.)

The response variate is specified by the Y parameter, and the explanatory variate by the X parameter. You can also use the TERMS option to include additional x-variates in the model.

Information can be saved from the analysis by using the FITTEDVALUES, RESIDUALS, ESTIMATES and SE parameters, in the usual way. The LOWER and UPPER parameters can save the lower and upper values of a confidence interval for the x location of the intersection (or breakpoint) of the lines. The INTERCEPTS parameter can save a variate containing the intercept with the y-axis and of the two lines with the x-axis. The probability for the interval is specified by the CIPROBABILITY option, with default 0.95 (i.e. 95%).

Printed output is controlled by the PRINT option. The settings model, summary and fittedvalues operate as in ordinary regression. The estimates setting produces the parameter estimates as usual, and also the confidence interval for the x-value of the intersection of the lines. There is also a setting intercepts, which prints the values at which the model intercepts the x-axis and y-axis.

The PLOT option has settings to produce the following plots:

`breakpoint`	displays a partial likelihood plot, displaying the approximate F ratio for the model for a range of positions of the breakpoint between the two lines;
`lines`	plots the fitted lines;
`residuals`	produces the four standard model-checking plots of residuals – histograms, Normal and half-Normal plots, and plots of residuals against fitted values.

The TITLE parameter can supply a title for the plots; the default is to use the identifier of the Y variate. The PARTIALLIKELIHOOD parameter can save the points used for the breakpoint plot, as a pointer storing a variate with the y-coordinates as its first element, and a variate with the x-coordinates as its second element.

Options: PRINT, PLOT, HORIZONTAL, CIPROBABILITY, NGRIDLINES, TERMS, METHOD.
Parameters: Y, X, TITLE, FITTEDVALUES, RESIDUALS, ESTIMATES, SE, INTERCEPTS, LOWER, UPPER, PARTIALLIKELIHOOD.

Method

A model consisting of two straight line segments is fitted by least squares. This is done by defining variables,

Slope_1 = (X - Breakpoint_X) * (X < Breakpoint_X)

Slope_2 = (X - Breakpoint_X) * (X > Breakpoint_X)

where X is the explanatory variable, and Breakpoint_X is the value of the explanatory variable where the two segments join. The response variable is then regressed on Slope_1 and Slope_2. The slopes of the lines are the regression coefficients for Slope_1 and Slope_2. If Breakpoint_X is known, there is no problem. However, if it is not known, care is needed because the residual mean square may have local minima. If one of the straight lines is assumed to be horizontal, then only one slope is fitted and the other is set to zero.

The values of X are sorted into increasing order, and a sequence of trial values for Breakpoint_X is formed, consisting of the original values X plus NGRIDLINES-1 equally spaced values between each consecutive pair of X‘s. The regression of Y on Slope_1 and Slope_2 is fitted for each of these trial values. The one giving the smallest residual sum of squares is then chosen as a starting value for Breakpoint_X, and the model is fitted as a nonlinear model using FITNONLINEAR.

Suppose that at the true value of Breakpoint_X the residual sum of squares is Rt, and that at the fitted value of Breakpoint_X the residual sum of squares is Rf and the residual mean square is Sf. If we assume that the observations are independently and normally distributed with common variance, the distribution of (Rt–Rf)/Sf can be approximated by an F-distribution with degrees of freedom one and number of observations minus four. Hence the set of values for Breakpoint_X for which (Rt–Rf)/Sf is less than the 95th percentile of the F-distribution defines a 95% confidence region. It is possible for this region to consist of more than one distinct interval. The confidence interval will contain the minimum and maximum values of Breakpoint_X in the region. The calculated variance ratios and the trial values of Breakpoint_X are returned in PARTIALLIKELIHOOD.

Action with `RESTRICT`

Restrictions on X and Y are obeyed.

Example

CAPTION 'R2LINES example'; STYLE=meta
VARIATE X,Y; VALUES=\
        !(-3.12,-1.74,4.36,7.27,7.90,9.05,11.01,18.51,18.96,\
          24.38,27.42,33.58,38.61,42.79,44.86,48.21,61.60,75.25),\
        !(0.14,0.69,0.43,1.00,0.81,0.70,0.19,1.06,0.57,\
          3.16,1.75,12.54,1.81,5.46,7.86,10.39,22.43,39.35)
R2LINES [PRINT=model,summary,estimates,fittedvalues,intercepts;\
        PLOT=breakpoint,lines,residuals] Y; X
&       [HORIZONTAL=left] Y; X

Updated on January 12, 2022

Was this article helpful?

Yes No