CASSOCIATION procedure

Calculates measures of association for circular data (S.J. Clark).

Options

`PRINT` = string token	What to print (`tests`); default `test`
`NRANDOMIZATIONS` = scalar	Number of randomizations to use in the randomization tests; default 999
`ASCALE` = string token	Units of the circular variables (`degrees`, `radians`); default `degr`

Parameters

`Y` = variates	Response variable
`X` = variates	Circular explanatory variable
`YTYPE` = string tokens	Type of response variable (`circular`, `linear`); default `circ`
`SEED` = variates	Variate of length two, firstly to supply a seed for the randomization tests and secondly to supply a seed to use for randomly-selecting sets of data points; default `!(0,0)`
`STATISTICS` = variates	Saves the test statistics

Description

CASSOCIATION calculates measures of association between a linear response variate and a circular explanatory variate (i.e. linear-circular) or between a circular response variate and a circular explanatory variate (i.e. circular-circular), as described in Fisher (1993, Chapter 6, Sections 6.1 – 6.3). The case of a circular response variate and a linear explanatory variable is not covered by CASSOCIATION; instead see procedure RCIRCULAR.

The data variates are supplied by the Y and X parameters. X should always be a circular variable. Y may be a linear or circular variable; its type is specified by the YTYPE parameter. So YTYPE=circular defines circular-circular data, and YTYPE=linear defines linear-circular data. Circular variables should represent vectorial data (i.e. directed lines). If they originally represent axial data (i.e. undirected lines), they should be transformed to vectorial data before using CASSOCIATION, by doubling and reducing their values modulo 360° i.e. by

CALCULATE X = MODULO(2*X; 360)

(see Fisher 1993, page xvii). By default, circular variables should be supplied as degrees, but you can supply radians instead by setting option ASCALE=radians.

Printed output is controlled by the PRINT option, with setting

`tests`	to print the results of the relevant tests (default).

The NRANDOMIZATIONS option specifies the number of randomizations to use with each of the randomization tests (see Method); the default is 999.

The SEED parameter can be set to a variate of length two, to supply seeds for the random numbers that may be used by CASSOCIATION with each y-variate. The first value provides a seed for the RANDOMIZE directive when calculating the randomization tests. The second value provides a seed for the CALCULATE directive when selecting random sets of points to calculate some of the statistics when there are too many data values to form all the sets (see the Method section for details). These both have a default setting of zero, which continues the existing sequence of random numbers if any have already been used in the current Genstat job; otherwise Genstat picks a seed at random. The seeds can be any positive integer, but only the last six digits of its integer part are used.

The test statistics can be saved using the STATISTICS parameter. For both linear-circular and circular-circular data the result will be a variate of length three containing either D_n, λ_n and R_n², or Δ_n, Π_n and ρ_T, respectively (see Method).

Options: PRINT, NRANDOMIZATIONS, ASCALE.

Parameters: Y, X, YTYPE, SEED, STATISTICS.

Method

Full details of the terminology and methodology are given in Fisher (1993, Chapter 6, Sections 6.1 – 6.3). The various tests, test statistics and methods for assessing significance are outlined here. In the equations below, n represents the sample size.

Linear-circular association can be represented as a curve on the surface of a cylinder: the response variate is the height of the curve on the cylinder, and the explanatory variate is the angle around the cylinder. A curve that performs a sine wave around the cylinder is said to show C-linear association. The more general form, that has one minimum and one maximum around the cylinder (and that joins up at zero and 360 degrees) is said to show C-association. Three tests are provided for linear-circular data. The first tests for the presence of C-association using a test statistic D_n (Mardia 1976), which has a range [0,1] and is zero if there is no C-association. The value of D_n is assessed by calculating an associated statistic U_n. For 5 ≤ n ≤ 100, upper 100α% critical values of U_n from Appendix A10 of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For n > 100 the probability of U_n can be approximated by exp(-U_n²/2). No probability values are available for n < 5.

The second test assesses the extent of C-association using a statistic λ_n; see Fisher & Lee (1981). This represents the probability that a randomly-selected sequence of four data points is c-concordant, i.e. whether they go up and down (or down and up) successively in their progress around the cylinder (see page 142 of Fisher 1993). When there is no C-association, λ_n=2/3. Larger values of λ_n indicate presence of a “C-monotone relationship”, whilst smaller values represent an ordinary monotone relationship (as between two ordinary linear random variables). The exact statistic is calculated for samples of size n ≤ 30, by forming all possible ordered subsets of size four; otherwise it is estimated by taking 30000 randomly-selected subsets. For n < 6, no probability values are available. For 6 ≤ n ≤ 8, the cumulative probability distribution of the test statistic λ_n from Appendix A11(a) of Fisher (1993) is printed in the output. For 9 ≤ n ≤ 20, a randomization test is used to assess the significance. For n > 20, the statistic Λ_n = n(λ_n – 2/3) is referred to tables of upper 100α% critical values from Appendix 11(c) of Fisher (1993).

The third test assess the extent of C-linear dependence using a test statistic R_n² (Mardia 1976; Liddell & Ord 1978), which represents the multiple correlation of X with (cos(Y), sin(Y)). The significance is assessed using a randomization test. The null hypothesis of no C-linear association is rejected if R_n² is large.

With circular-circular data the two variables are said to have T-monotone association if, whenever we choose three values from the response variate and arrange them in a clockwise order, the equivalent three values from the explanatory variate will be in either a clockwise order or an anti-clockwise order (i.e. the two sets of values will be met in the same order, one then two then three, going either clockwise or anti-clockwise). They are said to have a T-linear association if either

Y = X + θ₀ (modulo 360°)

(representing complete positive association), or

Y = –X + θ₀ (modulo 360°)

(representing complete negative association). Again, three tests are provided. The first is a test for T-monotone association based on quantifying the amount of T-monotone association directly, i.e. by estimating a statistic Δ_n which represents a circular correlation coefficient; see Fisher & Lee (1982). When Y and X are dependent, Δ takes the value -1 or 1, but Δ = 0 does not imply independence, only that the association is not of T-monotone form. The estimation of Δ_n is based on calculation of T-concordancy/discordancy for all distinct subsets of three pairs of data values. The null hypothesis that there is no T-monotone association is rejected if Δ_n differs significantly from zero. The exact statistic is calculated for samples of size n ≤ 50, by forming all possible ordered subsets of size three; otherwise it is estimated by taking 20000 randomly-selected subsets. For 3 ≤ n ≤ 7, the probability is calculated from the critical values of n × Δ_n given in Appendix A12(a) of Fisher (1993). For n > 7, upper 100α% critical values of n × Δ_n from Appendix A12(b) of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For a one-sided (or two-sided) test with significance level α, the value n × Δ_n of (or |n × Δ_n|) should be compared with the upper 100α% (or 100(α/2)%) critical values.

The second test for T-monotone association is based on circular ranks, with test statistic Π_n (again representing a correlation coefficient); see Fisher & Lee (1982, 1983). The null hypothesis of no T-monotone association (i.e. Y and X independent) is is rejected if Π_n differs significantly from zero. For 3 ≤ n ≤ 7, the probability is calculated from the probability distribution of (n-1) × Π_n given in Appendix A13(a) of Fisher (1993). [Note that the penultimate value of x given there for n = 7 is assumed to be 0.21.] For n ≥ 8, upper 100α% critical values of the distribution of (n-1) × Π_n from Appendix A13(b) of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For a one-sided (or two-sided) test with significance level α, the value of (n-1) × Π_n (or |(n-1) × Π_n|) should be compared with the upper 100α% (or 100(α/2)%) critical values.

The third test checks for T-linear association using the test statistic ρ_T (Fisher & Lee 1983, 1986) which has range [-1,1]. The null hypothesis of no T-linear association is rejected if |ρ_T| is large. For n < 25 a randomization test is used. For n ≥ 25 the test depends on the marginal distributions of Y and X. If either distribution has a mean resultant length zero, the null hypothesis is rejected if |n × ρ_T| > -log(α). Alternatively, if neither of the mean resultant lengths is equal to zero, a related statistic Z is used that has an approximate Normal distribution (see Fisher 1993, page 152). An approximate 95% Jackknife confidence interval is constructed for ρ_T.

Action with `RESTRICT`

Y and X may be restricted but must have compatible numbers of values.

References

Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge University Press, Cambridge, UK.

Fisher, N.I. & Lee, A.J. (1981). Nonparametric measures of angular-linear association. Biometrika, 68, 629-36.

Fisher, N.I. & Lee, A.J. (1982). Nonparametric measures of angular-angular association. Biometrika, 69, 315-21.

Fisher, N.I. & Lee, A.J. (1983). A correlation coefficient for circular data. Biometrika, 70, 327-32.

Fisher, N.I. & Lee, A.J. (1986). Correlation coefficients for random variables on a unit sphere or hypersphere. Biometrika, 73, 159-64.

Liddell, I.G. & Ord, J.K. (1978). Linear-circular correlation coefficients: some further results. Biometrika, 65, 448-50.

Mardia, K.V. (1976). Linear-circular correlation coefficients and rhythmometry. Biometrika, 63, 403-5.

Example

CAPTION      'CASSOCIATION example 1',\
             !t('Ozone concentration and wind direction',\
             '(Fisher 1993, Statistical analysis of circular data,',\
             'Examples 6.4 & 6.5).');\
             STYLE=meta,plain
VARIATE      [VALUES=28,85.2,80.5,4.7,45.9,12.7,72.5,56.6,31.5,112,\
             20,72.5,16,45.9,32.6,56.6,52.6,91.8,55.2] Ozone
&            [VALUES=327,91,88,305,344,270,67,21,281,8,\
             204,86,333,18,57,6,11,27,84] Wind
CASSOCIATION Ozone; X=Wind; YTYPE=linear; SEED=!(361036,0)
CAPTION      'CASSOCIATION example 2',\
             !t('Nest orientations and creek directions',\
             '(Fisher 1993, Statistical analysis of circular data,',\
             'Example6.6).');\
             STYLE=meta,plain
VARIATE      [VALUES=240,230,250,30,215,215,135,110,240,105,\
             125,125,130,160,160,145,225,230,295,295,\
             140,140,140,205,215,135,110,105,90,130,\
             200,240,105,125,125,125,130,160,160,250,\
             200,200,240,240,240,250,250,250,140,140] Nest
&            [VALUES=105,75,80,105,110,75,90,100,100,80,\
             150,135,145,130,150,125,120,140,150,140,\
             135,150,120,135,135,130,150,150,120,150,\
             150,130,140,180,190,190,170,180,160,185,\
             170,180,200,190,195,180,165,170,200,175] Creek
CASSOCIATION Nest; X=Creek; YTYPE=circular; SEED=!(578931,0)

Updated on March 8, 2019

Was this article helpful?

Yes No