Performs a Cate-Nelson graphical analysis of bivariate data (V.M. Cave).
Controls printed output (
What graphs to plot (
||Direction of the association between the y and x values (
Pre-specified critical value of y; default * i.e. the critical value of y is estimated
Pre-specified critical value of x; default * i.e. the critical value of x is estimated
||Title for the Cate-Nelson plot; if unset, the title is generated automatically|
||Y-axis title for the Cate-Nelson plot; if unset, the title is generated automatically|
||X-axis title for the Cate-Nelson plot; if unset, the title is generated automatically|
||Window to use for the graphs; default 3|
||Specifies the save structure of regression model holding the y-values, distribution, link function and weights; default * i.e. that from last regression fitted|
||Supplies the x-values for each analysis|
||Saves the critical value of x, the critical value of y and the quadrant allocations for each
RCATENELSON procedure performs a graphical analysis of bivariate data (x,y) as defined by Cate & Nelson (1971). It also extends their analysis to y-variates with non-Normal distributions.
RCATENELSON, you need to give a
MODEL statement defining the y-variate. The distribution of the y-variate, a link function and weights can also be defined with the
MODEL statement. (Note, however, that multinomial distributions, user-defined distributions and link functions and generalized least squares are not accommodated by
RCATENELSON.) The variate containing the x-values is supplied using the
The objective of the Cate-Nelson graphical analysis is to divide the data into two groups, based on the x-values, so that there is maximum statistical homogeneity within each group. The procedure finds the value of x that, in terms of predictive ability, best divides the data into two groups. This critical value of x is determined by iteratively dividing the data into two groups at each candidate critical x-value and selecting the one that minimizes the residual sum of squares, or the deviance for distributions other than the Normal. Alternatively, a pre-specified critical value of x may be supplied, as a scalar, using the
After determining the critical value of x, the procedure then finds the critical value of y. (For the Binomial distribution, y is defined as the proportion of successes.) The critical values of x and y split the scatter plot of y on x into four quadrants: two of these contain data that follow the predictive model, and two (known as the error quadrants) contain data do not follow the model. The critical value of y is also determined iteratively, but here the critical value minimizes the number of observations that fall into error quadrants, i.e. those that do not conform with the predictive model. Alternatively, a pre-specified critical value of y may be supplied, as a scalar, using the
DIRECTION option specifies whether the association between the y and x values is
ascending (i.e. following a positive trend; the default) or
descending (i.e. following a negative trend). This determines the error quadrants. For an
ascending trend (i.e. where y increases with increasing x), observations in the top left (I) and in the bottom right (III) quadrants do not conform with the predictive model. Therefore, for data with an ascending trend, the critical y-value minimizes the number of observations that fall into Quadrants I and III. Conversely, for a
descending trend (i.e. y where decreases with increasing x), the error quadrants are the top right (II) and bottom left (IV).
When there is more than one candidate critical x-value, or more than one candidate critical y-value, results are generated for each possibility.
Printed output is controlled by the
||prints a summary of the analysis, including the critical x-value, the critical y-value, the error rate (i.e. the percentage of observations falling into the two error quadrants) and the count and percentage of observations in each quadrant.|
||prints the allocation of data to each quadrant|
||prints the data falling into the error quadrants.|
PLOT option controls the graphical output, with these settings.
||produces a Cate-Nelson plot. Here, a scatter plot of y on x is drawn, with a horizontal line superimposed through the critical value of y, and a vertical line superimposed through the critical value of x, splitting the data into four quadrants. Observations that fall into the error quadrants are drawn as red crosses, labelled by their unit number. Observations that followed the predictive model are drawn as black hollow circles.|
||produces a plot of the residual sum of squares (or deviance for non-Normal distributions) against the candidate critical values of x, and a plot of the number of observations falling into the error quadrants against the candidate critical values of y. If
By default, the Cate-Nelson plot is produced.
XTITLE options can supply an overall title, a y-axis title and a x-axis title for the Cate-Nelson plot, respectively. If these are not supplied, suitable titles are generated automatically. To omit a title, a blank string can be supplied, e.g.
WINDOW option defines the window to use for the plots; default 3.
Results can be saved using the
RESULTS parameter. They are in a single pointer if there is only one critical x and critical y value. If there are several, they are in a pointer containing a pointer for each pair of critical x and critical y values. The first element of these pointers, indexed by ‘
Critical x-value‘, is a scalar storing the critical value of x. The second element, indexed by ‘
Critical y-value‘, is a scalar storing the critical value of y. The third element, indexed by ‘
Quadrant‘, stores the allocation of data to each quadrant, and is ordered by the unit number.
RCATENELSON uses the methods described in Cate & Nelson (1971) and Mangiafico (2013), but extended to accommodate y-variates with non-Normal distributions.
Candidate critical values of x are formed by ordering the unique values in
X, and calculating the midpoint between each adjacent pair. Following Cate & Nelson (1971), the procedure ensures that at least two x-values fall to the left and to the right of each candidate value. The critical value of x minimizes the Residual Sum of Squares, or deviance for non-Normal distributions, which is obtained using the
Candidate critical values of y are formed by ordering the unique values in Y, and calculating the midpoint between each adjacent pair. (For the Binomial distribution, the proportion of successes is used.) The critical value of y minimizes the number of observations in the error quadrants.
RCATENELSON will work with restricted
X variates, and restricted
WEIGHTS settings of
MODEL. However, if more than one is restricted, they must be restricted in the same way.
Cate, R.B. & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings, 35, 658–660.
Mangiafico, S.S. (2013). Cate-Nelson analysis for bivariate data using R-project. Journal of Extension, 51, 5TOT1.
CAPTION 'RCATENELSON example',\ !T('The data are relative yield of cotton (%) and the potassium',\ 'concentration of the soil (ppm).'),\ !T('From de Freitas et al. (1966). Determination of potassium',\ 'deficient areas for cotton. Potash Review.'); \ STYLE=meta,plain,plain VARIATE [VALUES=53.5,64.8,63.0,40.8,79.5,70.3,63.0,64.0,94.0,99.0,66.5,\ 103.0,97.3,85.3,101.3,97.0,96.8,98.0,85.8,92.3,96.8,88.3,\ 106.8,97.5] Yield VARIATE [VALUES=26,28,30,31,34,35,40,44,49,56,68,75,77,78,78,102,118,118,\ 131,133,133,152,193,211] K MODEL Y=Yield RCATENELSON [PLOT=catenelson,criticalvalues; \ YTITLE='Relative yield of cotton (%)';\ XTITLE='Soil potassium concentration (ppm)'] X=K