1. Home
2. RCATENELSON procedure

# RCATENELSON procedure

Performs a Cate-Nelson graphical analysis of bivariate data (V.M. Cave).

### Options

`PRINT` = string tokens Controls printed output (`summary`, `quadrants`, `errorquadrants`); default `summ`, `quad` What graphs to plot (`catenelson`, `criticalvalues`); default `cate` Direction of the association between the y and x values (`ascending`, `descending`); default `asce` i.e. a positive trend Pre-specified critical value of y; default * i.e. the critical value of y is estimated Pre-specified critical value of x; default * i.e. the critical value of x is estimated Title for the Cate-Nelson plot; if unset, the title is generated automatically Y-axis title for the Cate-Nelson plot; if unset, the title is generated automatically X-axis title for the Cate-Nelson plot; if unset, the title is generated automatically Window to use for the graphs; default 3 Specifies the save structure of regression model holding the y-values, distribution, link function and weights; default * i.e. that from last regression fitted

### Parameters

`x` = variates Supplies the x-values for each analysis Saves the critical value of x, the critical value of y and the quadrant allocations for each `X` variate

### Description

The `RCATENELSON` procedure performs a graphical analysis of bivariate data (x,y) as defined by Cate & Nelson (1971). It also extends their analysis to y-variates with non-Normal distributions.

Before using `RCATENELSON`, you need to give a `MODEL` statement defining the y-variate. The distribution of the y-variate, a link function and weights can also be defined with the `MODEL` statement. (Note, however, that multinomial distributions, user-defined distributions and link functions and generalized least squares are not accommodated by `RCATENELSON`.) The variate containing the x-values is supplied using the `X` parameter.

The objective of the Cate-Nelson graphical analysis is to divide the data into two groups, based on the x-values, so that there is maximum statistical homogeneity within each group. The procedure finds the value of x that, in terms of predictive ability, best divides the data into two groups. This critical value of x is determined by iteratively dividing the data into two groups at each candidate critical x-value and selecting the one that minimizes the residual sum of squares, or the deviance for distributions other than the Normal. Alternatively, a pre-specified critical value of x may be supplied, as a scalar, using the `XCRITICAL` option.

After determining the critical value of x, the procedure then finds the critical value of y. (For the Binomial distribution, y is defined as the proportion of successes.) The critical values of x and y split the scatter plot of y on x into four quadrants: two of these contain data that follow the predictive model, and two (known as the error quadrants) contain data do not follow the model. The critical value of y is also determined iteratively, but here the critical value minimizes the number of observations that fall into error quadrants, i.e. those that do not conform with the predictive model. Alternatively, a pre-specified critical value of y may be supplied, as a scalar, using the `YCRITICAL` option.

The `DIRECTION` option specifies whether the association between the y and x values is `ascending` (i.e. following a positive trend; the default) or `descending` (i.e. following a negative trend). This determines the error quadrants. For an `ascending` trend (i.e. where y increases with increasing x), observations in the top left (I) and in the bottom right (III) quadrants do not conform with the predictive model. Therefore, for data with an ascending trend, the critical y-value minimizes the number of observations that fall into Quadrants I and III. Conversely, for a `descending` trend (i.e. y where decreases with increasing x), the error quadrants are the top right (II) and bottom left (IV).

When there is more than one candidate critical x-value, or more than one candidate critical y-value, results are generated for each possibility.

Printed output is controlled by the `PRINT` option, with the following settings.

 `summary` prints a summary of the analysis, including the critical x-value, the critical y-value, the error rate (i.e. the percentage of observations falling into the two error quadrants) and the count and percentage of observations in each quadrant. `quadrants` prints the allocation of data to each quadrant `errorquadrants` prints the data falling into the error quadrants.

The `PLOT` option controls the graphical output, with these settings.

 `catenelson` produces a Cate-Nelson plot. Here, a scatter plot of y on x is drawn, with a horizontal line superimposed through the critical value of y, and a vertical line superimposed through the critical value of x, splitting the data into four quadrants. Observations that fall into the error quadrants are drawn as red crosses, labelled by their unit number. Observations that followed the predictive model are drawn as black hollow circles. `criticalvalues` produces a plot of the residual sum of squares (or deviance for non-Normal distributions) against the candidate critical values of x, and a plot of the number of observations falling into the error quadrants against the candidate critical values of y. If `XCRITICAL` is supplied, no residual diagnostic plot will be produced for the residual sum of squares or deviance. If `YCRITICAL` is supplied, no diagnostic plot will be produced for the error quadrants.

By default, the Cate-Nelson plot is produced.

The `TITLE`, `YTITLE` and `XTITLE` options can supply an overall title, a y-axis title and a x-axis title for the Cate-Nelson plot, respectively. If these are not supplied, suitable titles are generated automatically. To omit a title, a blank string can be supplied, e.g.

`XTITLE=' '`

The `WINDOW` option defines the window to use for the plots; default 3.

Results can be saved using the `RESULTS` parameter. They are in a single pointer if there is only one critical x and critical y value. If there are several, they are in a pointer containing a pointer for each pair of critical x and critical y values. The first element of these pointers, indexed by ‘`Critical x-value`‘, is a scalar storing the critical value of x. The second element, indexed by ‘`Critical y-value`‘, is a scalar storing the critical value of y. The third element, indexed by ‘`Quadrant`‘, stores the allocation of data to each quadrant, and is ordered by the unit number.

Options: `PRINT`, `PLOT`, `DIRECTION`, `YCRITICAL`, `XCRITICAL`, `TITLE`, `YTITLE`, `XTITLE, WINDOW`, `SAVE`.
Parameters: `X`, `RESULTS`.

### Method

`RCATENELSON` uses the methods described in Cate & Nelson (1971) and Mangiafico (2013), but extended to accommodate y-variates with non-Normal distributions.

Candidate critical values of x are formed by ordering the unique values in `X`, and calculating the midpoint between each adjacent pair. Following Cate & Nelson (1971), the procedure ensures that at least two x-values fall to the left and to the right of each candidate value. The critical value of x minimizes the Residual Sum of Squares, or deviance for non-Normal distributions, which is obtained using the `MODEL` and `FIT` directives.

Candidate critical values of y are formed by ordering the unique values in Y, and calculating the midpoint between each adjacent pair. (For the Binomial distribution, the proportion of successes is used.) The critical value of y minimizes the number of observations in the error quadrants.

### Action with `RESTRICT`

`RCATENELSON` will work with restricted `X` variates, and restricted `Y`, `NBINOMIAL` and `WEIGHTS` settings of `MODEL`. However, if more than one is restricted, they must be restricted in the same way.

Cate, R.B. & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings, 35, 658–660.

Mangiafico, S.S. (2013). Cate-Nelson analysis for bivariate data using R-project. Journal of Extension, 51, 5TOT1.

Commands for: Graphics, Regression analysis.

### Example

```CAPTION     'RCATENELSON example',\
!T('The data are relative yield of cotton (%) and the potassium',\
'concentration of the soil (ppm).'),\
!T('From de Freitas et al. (1966). Determination of potassium',\
'deficient areas for cotton. Potash Review.'); \
STYLE=meta,plain,plain
VARIATE     [VALUES=53.5,64.8,63.0,40.8,79.5,70.3,63.0,64.0,94.0,99.0,66.5,\
103.0,97.3,85.3,101.3,97.0,96.8,98.0,85.8,92.3,96.8,88.3,\
106.8,97.5] Yield
VARIATE     [VALUES=26,28,30,31,34,35,40,44,49,56,68,75,77,78,78,102,118,118,\
131,133,133,152,193,211] K
MODEL       Y=Yield
RCATENELSON [PLOT=catenelson,criticalvalues; \
YTITLE='Relative yield of cotton (%)';\
XTITLE='Soil potassium concentration (ppm)'] X=K
```