Fits two-straight-line (broken-stick) models to data (A.W.A. Murray & J.T. Wood).

### Options

`PRINT` = string token |
What to print (`model` , `summary` , `estimates` , `fittedvalues` , `intercepts` ); default `mode` , `summ` , `esti` |
---|---|

`PLOT` = string tokens |
What to plot (`breakpoint` , `lines` , `residuals` ); default `*` i.e. nothing |

`HORIZONTAL` = string token |
Forces either the left- the or right-hand line to be horizontal (`left` , `right` ); default `*` i.e. neither |

`CIPROBABILITY` = scalar |
Sets the probability level of the confidence interval about the `X` value at the intersection; default 0.95 |

`NGRIDLINES` = scalar |
Controls the number of points used in the initial search for the intersection of the lines; default 100 |

`TERMS` = variates |
Additional x-variates to include in the model; default none |

`METHOD` = string token |
Optimization method (`gaussnewton` , `newtonraphson` ,` fletcherpowell` ); default `newt` |

### Parameters

`Y` = variates |
Response variates to be modelled |
---|---|

`X` = variates |
Explanatory variable for each response variate |

`TITLE` = texts |
Title to use on the graphs for each response variate |

`FITTEDVALUES` = variates |
Saves fitted values |

`RESIDUALS` = variates |
Saves standardized residuals |

`ESTIMATES` = variates |
Saves estimates from each model (i.e. intersection coordinates and slopes of the fitted lines) |

`SE` = variates |
Saves standard errors of the estimates |

`INTERCEPTS` = variates |
Saves the intercepts |

`LOWER` = scalars |
Saves the lower bound of the confidence interval about the x-value at the intersection |

`UPPER` = scalars |
Saves the upper bound of the confidence interval about the x-value at the intersection |

`PARTIALLIKELIHOOD` = pointers |
Saves the partial likelihood and grid values for partial likelihood plots |

### Description

`R2LINES`

fits a model consisting of two straight line segments (a broken-stick or split-line model) to the data. The `HORIZONTAL`

option can be set to `left`

or `right`

to force either the left- or the right-hand line to be horizontal. A check is made to ensure that the overall best intersection point is used for the two lines. The `NGRIDLINES`

option specifies the number of extra points used i between each pair of x’s in the initial search for the best intersection point; default 30100. The `METHOD`

option specifies the optimization method that is then used to estimate the intersection point. The default is to use the Newton-Raphson method. (See the `RCYCLE`

directive for details.)

The response variate is specified by the `Y`

parameter, and the explanatory variate by the `X`

parameter. You can also use the `TERMS`

option to include additional x-variates in the model.

Information can be saved from the analysis by using the `FITTEDVALUES`

, `RESIDUALS`

, `ESTIMATES`

and `SE`

parameters, in the usual way. The `LOWER`

and `UPPER`

parameters can save the lower and upper values of a confidence interval for the x location of the intersection (or breakpoint) of the lines. The `INTERCEPTS`

parameter can save a variate containing the intercept with the y-axis and of the two lines with the x-axis. The probability for the interval is specified by the `CIPROBABILITY`

option, with default 0.95 (i.e. 95%).

Printed output is controlled by the `PRINT`

option. The settings `model`

, `summary`

and `fittedvalues`

operate as in ordinary regression. The `estimates`

setting produces the parameter estimates as usual, and also the confidence interval for the x-value of the intersection of the lines. There is also a setting `intercepts`

, which prints the values at which the model intercepts the x-axis and y-axis.

The `PLOT`

option has settings to produce the following plots:

`breakpoint` |
displays a partial likelihood plot, displaying the approximate F ratio for the model for a range of positions of the breakpoint between the two lines; |
---|---|

`lines` |
plots the fitted lines; |

`residuals` |
produces the four standard model-checking plots of residuals – histograms, Normal and half-Normal plots, and plots of residuals against fitted values. |

The `TITLE`

parameter can supply a title for the plots; the default is to use the identifier of the `Y`

variate. The `PARTIALLIKELIHOOD`

parameter can save the points used for the breakpoint plot, as a pointer storing a variate with the y-coordinates as its first element, and a variate with the x-coordinates as its second element.

Options: `PRINT`

, `PLOT`

, `HORIZONTAL`

, `CIPROBABILITY`

, `NGRIDLINES`

, `TERMS`

, `METHOD`

.

Parameters: `Y`

, `X`

, `TITLE`

, `FITTEDVALUES`

, `RESIDUALS`

, `ESTIMATES`

, `SE`

, `INTERCEPTS`

, `LOWER`

, `UPPER`

, `PARTIALLIKELIHOOD`

.

### Method

A model consisting of two straight line segments is fitted by least squares. This is done by defining variables,

`Slope_1 = (X - Breakpoint_X) * (X < Breakpoint_X)`

`Slope_2 = (X - Breakpoint_X) * (X > Breakpoint_X)`

where `X`

is the explanatory variable, and `Breakpoint_X`

is the value of the explanatory variable where the two segments join. The response variable is then regressed on `Slope_1`

and `Slope_2`

. The slopes of the lines are the regression coefficients for `Slope_1`

and `Slope_2`

. If `Breakpoint_X`

is known, there is no problem. However, if it is not known, care is needed because the residual mean square may have local minima. If one of the straight lines is assumed to be horizontal, then only one slope is fitted and the other is set to zero.

The values of `X`

are sorted into increasing order, and a sequence of trial values for `Breakpoint_X`

is formed, consisting of the original values `X`

plus `NGRIDLINES`

-1 equally spaced values between each consecutive pair of `X`

‘s. The regression of `Y`

on `Slope_1`

and `Slope_2`

is fitted for each of these trial values. The one giving the smallest residual sum of squares is then chosen as a starting value for `Breakpoint_X`

, and the model is fitted as a nonlinear model using `FITNONLINEAR`

.

Suppose that at the true value of `Breakpoint_X`

the residual sum of squares is `Rt`

, and that at the fitted value of `Breakpoint_X`

the residual sum of squares is `Rf`

and the residual mean square is `Sf`

. If we assume that the observations are independently and normally distributed with common variance, the distribution of (`Rt`

–`Rf`

)/`Sf`

can be approximated by an F-distribution with degrees of freedom one and number of observations minus four. Hence the set of values for `Breakpoint_X`

for which (`Rt`

–`Rf`

)/`Sf`

is less than the 95th percentile of the F-distribution defines a 95% confidence region. It is possible for this region to consist of more than one distinct interval. The confidence interval will contain the minimum and maximum values of `Breakpoint_X`

in the region. The calculated variance ratios and the trial values of `Breakpoint_X`

are returned in `PARTIALLIKELIHOOD`

.

### Action with `RESTRICT`

Restrictions on `X`

and `Y`

are obeyed.

### See also

Directives: `FITCURVE`

, `FITNONLINEAR`

.

Commands for: Regression analysis.

### Example

CAPTION 'R2LINES example'; STYLE=meta VARIATE X,Y; VALUES=\ !(-3.12,-1.74,4.36,7.27,7.90,9.05,11.01,18.51,18.96,\ 24.38,27.42,33.58,38.61,42.79,44.86,48.21,61.60,75.25),\ !(0.14,0.69,0.43,1.00,0.81,0.70,0.19,1.06,0.57,\ 3.16,1.75,12.54,1.81,5.46,7.86,10.39,22.43,39.35) R2LINES [PRINT=model,summary,estimates,fittedvalues,intercepts;\ PLOT=breakpoint,lines,residuals] Y; X & [HORIZONTAL=left] Y; X