RCATENELSON procedure

Performs a Cate-Nelson graphical analysis of bivariate data (V.M. Cave).

Options

`PRINT` = string tokens	Controls printed output (`summary`, `quadrants`, `errorquadrants`); default `summ`, `quad`
`PLOT` = string tokens	What graphs to plot (`catenelson`, `criticalvalues`); default `cate`
`DIRECTION` = string token	Direction of the association between the y and x values (`ascending`, `descending`); default `asce` i.e. a positive trend
`YCRITICAL` = scalar	Pre-specified critical value of y; default * i.e. the critical value of y is estimated
`XCRITICAL` = scalar	Pre-specified critical value of x; default * i.e. the critical value of x is estimated
`TITLE` = text	Title for the Cate-Nelson plot; if unset, the title is generated automatically
`YTITLE` = text	Y-axis title for the Cate-Nelson plot; if unset, the title is generated automatically
`XTITLE` = text	X-axis title for the Cate-Nelson plot; if unset, the title is generated automatically
`WINDOW` = scalar	Window to use for the graphs; default 3
`SAVE` = identifier	Specifies the save structure of regression model holding the y-values, distribution, link function and weights; default * i.e. that from last regression fitted

Parameters

`x` = variates	Supplies the x-values for each analysis
`RESULTS` = pointers	Saves the critical value of x, the critical value of y and the quadrant allocations for each `X` variate

Description

The RCATENELSON procedure performs a graphical analysis of bivariate data (x,y) as defined by Cate & Nelson (1971). It also extends their analysis to y-variates with non-Normal distributions.

Before using RCATENELSON, you need to give a MODEL statement defining the y-variate. The distribution of the y-variate, a link function and weights can also be defined with the MODEL statement. (Note, however, that multinomial distributions, user-defined distributions and link functions and generalized least squares are not accommodated by RCATENELSON.) The variate containing the x-values is supplied using the X parameter.

The objective of the Cate-Nelson graphical analysis is to divide the data into two groups, based on the x-values, so that there is maximum statistical homogeneity within each group. The procedure finds the value of x that, in terms of predictive ability, best divides the data into two groups. This critical value of x is determined by iteratively dividing the data into two groups at each candidate critical x-value and selecting the one that minimizes the residual sum of squares, or the deviance for distributions other than the Normal. Alternatively, a pre-specified critical value of x may be supplied, as a scalar, using the XCRITICAL option.

After determining the critical value of x, the procedure then finds the critical value of y. (For the Binomial distribution, y is defined as the proportion of successes.) The critical values of x and y split the scatter plot of y on x into four quadrants: two of these contain data that follow the predictive model, and two (known as the error quadrants) contain data do not follow the model. The critical value of y is also determined iteratively, but here the critical value minimizes the number of observations that fall into error quadrants, i.e. those that do not conform with the predictive model. Alternatively, a pre-specified critical value of y may be supplied, as a scalar, using the YCRITICAL option.

The DIRECTION option specifies whether the association between the y and x values is ascending (i.e. following a positive trend; the default) or descending (i.e. following a negative trend). This determines the error quadrants. For an ascending trend (i.e. where y increases with increasing x), observations in the top left (I) and in the bottom right (III) quadrants do not conform with the predictive model. Therefore, for data with an ascending trend, the critical y-value minimizes the number of observations that fall into Quadrants I and III. Conversely, for a descending trend (i.e. y where decreases with increasing x), the error quadrants are the top right (II) and bottom left (IV).

When there is more than one candidate critical x-value, or more than one candidate critical y-value, results are generated for each possibility.

Printed output is controlled by the PRINT option, with the following settings.

`summary`	prints a summary of the analysis, including the critical x-value, the critical y-value, the error rate (i.e. the percentage of observations falling into the two error quadrants) and the count and percentage of observations in each quadrant.
`quadrants`	prints the allocation of data to each quadrant
`errorquadrants`	prints the data falling into the error quadrants.

The PLOT option controls the graphical output, with these settings.

catenelson produces a Cate-Nelson plot. Here, a scatter plot of y on x is drawn, with a horizontal line superimposed through the critical value of y, and a vertical line superimposed through the critical value of x, splitting the data into four quadrants. Observations that fall into the error quadrants are drawn as red crosses, labelled by their unit number. Observations that followed the predictive model are drawn as black hollow circles.

criticalvalues produces a plot of the residual sum of squares (or deviance for non-Normal distributions) against the candidate critical values of x, and a plot of the number of observations falling into the error quadrants against the candidate critical values of y. If XCRITICAL is supplied, no residual diagnostic plot will be produced for the residual sum of squares or deviance. If YCRITICAL is supplied, no diagnostic plot will be produced for the error quadrants.

By default, the Cate-Nelson plot is produced.

The TITLE, YTITLE and XTITLE options can supply an overall title, a y-axis title and a x-axis title for the Cate-Nelson plot, respectively. If these are not supplied, suitable titles are generated automatically. To omit a title, a blank string can be supplied, e.g.

XTITLE=' '

The WINDOW option defines the window to use for the plots; default 3.

Results can be saved using the RESULTS parameter. They are in a single pointer if there is only one critical x and critical y value. If there are several, they are in a pointer containing a pointer for each pair of critical x and critical y values. The first element of these pointers, indexed by ‘Critical x-value‘, is a scalar storing the critical value of x. The second element, indexed by ‘Critical y-value‘, is a scalar storing the critical value of y. The third element, indexed by ‘Quadrant‘, stores the allocation of data to each quadrant, and is ordered by the unit number.

Options: PRINT, PLOT, DIRECTION, YCRITICAL, XCRITICAL, TITLE, YTITLE, XTITLE, WINDOW, SAVE.
Parameters: X, RESULTS.

Method

RCATENELSON uses the methods described in Cate & Nelson (1971) and Mangiafico (2013), but extended to accommodate y-variates with non-Normal distributions.

Candidate critical values of x are formed by ordering the unique values in X, and calculating the midpoint between each adjacent pair. Following Cate & Nelson (1971), the procedure ensures that at least two x-values fall to the left and to the right of each candidate value. The critical value of x minimizes the Residual Sum of Squares, or deviance for non-Normal distributions, which is obtained using the MODEL and FIT directives.

Candidate critical values of y are formed by ordering the unique values in Y, and calculating the midpoint between each adjacent pair. (For the Binomial distribution, the proportion of successes is used.) The critical value of y minimizes the number of observations in the error quadrants.

Action with `RESTRICT`

RCATENELSON will work with restricted X variates, and restricted Y, NBINOMIAL and WEIGHTS settings of MODEL. However, if more than one is restricted, they must be restricted in the same way.

References

Cate, R.B. & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings, 35, 658–660.

Mangiafico, S.S. (2013). Cate-Nelson analysis for bivariate data using R-project. Journal of Extension, 51, 5TOT1.

Example

CAPTION     'RCATENELSON example',\
            !T('The data are relative yield of cotton (%) and the potassium',\
               'concentration of the soil (ppm).'),\
            !T('From de Freitas et al. (1966). Determination of potassium',\
               'deficient areas for cotton. Potash Review.'); \
            STYLE=meta,plain,plain
VARIATE     [VALUES=53.5,64.8,63.0,40.8,79.5,70.3,63.0,64.0,94.0,99.0,66.5,\
                    103.0,97.3,85.3,101.3,97.0,96.8,98.0,85.8,92.3,96.8,88.3,\
                    106.8,97.5] Yield
VARIATE     [VALUES=26,28,30,31,34,35,40,44,49,56,68,75,77,78,78,102,118,118,\
                    131,133,133,152,193,211] K         
MODEL       Y=Yield
RCATENELSON [PLOT=catenelson,criticalvalues; \
             YTITLE='Relative yield of cotton (%)';\
             XTITLE='Soil potassium concentration (ppm)'] X=K

Updated on September 11, 2019

Was this article helpful?

Yes No