Calculates measures of association for circular data (S.J. Clark).
Options
PRINT = string token |
What to print (tests ); default test |
---|---|
NRANDOMIZATIONS = scalar |
Number of randomizations to use in the randomization tests; default 999 |
ASCALE = string token |
Units of the circular variables (degrees , radians ); default degr |
Parameters
Y = variates |
Response variable |
---|---|
X = variates |
Circular explanatory variable |
YTYPE = string tokens |
Type of response variable (circular , linear ); default circ |
SEED = variates |
Variate of length two, firstly to supply a seed for the randomization tests and secondly to supply a seed to use for randomly-selecting sets of data points; default !(0,0) |
STATISTICS = variates |
Saves the test statistics |
Description
CASSOCIATION
calculates measures of association between a linear response variate and a circular explanatory variate (i.e. linear-circular) or between a circular response variate and a circular explanatory variate (i.e. circular-circular), as described in Fisher (1993, Chapter 6, Sections 6.1 – 6.3). The case of a circular response variate and a linear explanatory variable is not covered by CASSOCIATION
; instead see procedure RCIRCULAR
.
The data variates are supplied by the Y
and X
parameters. X
should always be a circular variable. Y
may be a linear or circular variable; its type is specified by the YTYPE
parameter. So YTYPE=circular
defines circular-circular data, and YTYPE=linear
defines linear-circular data. Circular variables should represent vectorial data (i.e. directed lines). If they originally represent axial data (i.e. undirected lines), they should be transformed to vectorial data before using CASSOCIATION
, by doubling and reducing their values modulo 360° i.e. by
CALCULATE X = MODULO(2*X; 360)
(see Fisher 1993, page xvii). By default, circular variables should be supplied as degrees, but you can supply radians instead by setting option ASCALE=radians
.
Printed output is controlled by the PRINT
option, with setting
tests |
to print the results of the relevant tests (default). |
---|
The NRANDOMIZATIONS
option specifies the number of randomizations to use with each of the randomization tests (see Method); the default is 999.
The SEED
parameter can be set to a variate of length two, to supply seeds for the random numbers that may be used by CASSOCIATION
with each y-variate. The first value provides a seed for the RANDOMIZE
directive when calculating the randomization tests. The second value provides a seed for the CALCULATE
directive when selecting random sets of points to calculate some of the statistics when there are too many data values to form all the sets (see the Method section for details). These both have a default setting of zero, which continues the existing sequence of random numbers if any have already been used in the current Genstat job; otherwise Genstat picks a seed at random. The seeds can be any positive integer, but only the last six digits of its integer part are used.
The test statistics can be saved using the STATISTICS
parameter. For both linear-circular and circular-circular data the result will be a variate of length three containing either Dn, λn and Rn2, or Δn, Πn and ρT, respectively (see Method).
Options: PRINT
, NRANDOMIZATIONS
, ASCALE
.
Parameters: Y
, X
, YTYPE
, SEED
, STATISTICS
.
Method
Full details of the terminology and methodology are given in Fisher (1993, Chapter 6, Sections 6.1 – 6.3). The various tests, test statistics and methods for assessing significance are outlined here. In the equations below, n represents the sample size.
Linear-circular association can be represented as a curve on the surface of a cylinder: the response variate is the height of the curve on the cylinder, and the explanatory variate is the angle around the cylinder. A curve that performs a sine wave around the cylinder is said to show C-linear association. The more general form, that has one minimum and one maximum around the cylinder (and that joins up at zero and 360 degrees) is said to show C-association. Three tests are provided for linear-circular data. The first tests for the presence of C-association using a test statistic Dn (Mardia 1976), which has a range [0,1] and is zero if there is no C-association. The value of Dn is assessed by calculating an associated statistic Un. For 5 ≤ n ≤ 100, upper 100α% critical values of Un from Appendix A10 of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For n > 100 the probability of Un can be approximated by exp(-Un2/2). No probability values are available for n < 5.
The second test assesses the extent of C-association using a statistic λn; see Fisher & Lee (1981). This represents the probability that a randomly-selected sequence of four data points is c-concordant, i.e. whether they go up and down (or down and up) successively in their progress around the cylinder (see page 142 of Fisher 1993). When there is no C-association, λn=2/3. Larger values of λn indicate presence of a “C-monotone relationship”, whilst smaller values represent an ordinary monotone relationship (as between two ordinary linear random variables). The exact statistic is calculated for samples of size n ≤ 30, by forming all possible ordered subsets of size four; otherwise it is estimated by taking 30000 randomly-selected subsets. For n < 6, no probability values are available. For 6 ≤ n ≤ 8, the cumulative probability distribution of the test statistic λn from Appendix A11(a) of Fisher (1993) is printed in the output. For 9 ≤ n ≤ 20, a randomization test is used to assess the significance. For n > 20, the statistic Λn = n(λn – 2/3) is referred to tables of upper 100α% critical values from Appendix 11(c) of Fisher (1993).
The third test assess the extent of C-linear dependence using a test statistic Rn2 (Mardia 1976; Liddell & Ord 1978), which represents the multiple correlation of X
with (cos(Y
), sin(Y
)). The significance is assessed using a randomization test. The null hypothesis of no C-linear association is rejected if Rn2 is large.
With circular-circular data the two variables are said to have T-monotone association if, whenever we choose three values from the response variate and arrange them in a clockwise order, the equivalent three values from the explanatory variate will be in either a clockwise order or an anti-clockwise order (i.e. the two sets of values will be met in the same order, one then two then three, going either clockwise or anti-clockwise). They are said to have a T-linear association if either
Y
= X
+ θ0 (modulo 360°)
(representing complete positive association), or
Y
= –X
+ θ0 (modulo 360°)
(representing complete negative association). Again, three tests are provided. The first is a test for T-monotone association based on quantifying the amount of T-monotone association directly, i.e. by estimating a statistic Δn which represents a circular correlation coefficient; see Fisher & Lee (1982). When Y
and X
are dependent, Δ takes the value -1 or 1, but Δ = 0 does not imply independence, only that the association is not of T-monotone form. The estimation of Δn is based on calculation of T-concordancy/discordancy for all distinct subsets of three pairs of data values. The null hypothesis that there is no T-monotone association is rejected if Δn differs significantly from zero. The exact statistic is calculated for samples of size n ≤ 50, by forming all possible ordered subsets of size three; otherwise it is estimated by taking 20000 randomly-selected subsets. For 3 ≤ n ≤ 7, the probability is calculated from the critical values of n × Δn given in Appendix A12(a) of Fisher (1993). For n > 7, upper 100α% critical values of n × Δn from Appendix A12(b) of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For a one-sided (or two-sided) test with significance level α, the value n × Δn of (or |n × Δn|) should be compared with the upper 100α% (or 100(α/2)%) critical values.
The second test for T-monotone association is based on circular ranks, with test statistic Πn (again representing a correlation coefficient); see Fisher & Lee (1982, 1983). The null hypothesis of no T-monotone association (i.e. Y
and X
independent) is is rejected if Πn differs significantly from zero. For 3 ≤ n ≤ 7, the probability is calculated from the probability distribution of (n-1) × Πn given in Appendix A13(a) of Fisher (1993). [Note that the penultimate value of x given there for n = 7 is assumed to be 0.21.] For n ≥ 8, upper 100α% critical values of the distribution of (n-1) × Πn from Appendix A13(b) of Fisher (1993) are printed in the output (with linear interpolation where appropriate). For a one-sided (or two-sided) test with significance level α, the value of (n-1) × Πn (or |(n-1) × Πn|) should be compared with the upper 100α% (or 100(α/2)%) critical values.
The third test checks for T-linear association using the test statistic ρT (Fisher & Lee 1983, 1986) which has range [-1,1]. The null hypothesis of no T-linear association is rejected if |ρT| is large. For n < 25 a randomization test is used. For n ≥ 25 the test depends on the marginal distributions of Y
and X
. If either distribution has a mean resultant length zero, the null hypothesis is rejected if |n × ρT| > -log(α). Alternatively, if neither of the mean resultant lengths is equal to zero, a related statistic Z is used that has an approximate Normal distribution (see Fisher 1993, page 152). An approximate 95% Jackknife confidence interval is constructed for ρT.
Action with RESTRICT
Y
and X
may be restricted but must have compatible numbers of values.
References
Fisher, N.I. (1993). Statistical Analysis of Circular Data. Cambridge University Press, Cambridge, UK.
Fisher, N.I. & Lee, A.J. (1981). Nonparametric measures of angular-linear association. Biometrika, 68, 629-36.
Fisher, N.I. & Lee, A.J. (1982). Nonparametric measures of angular-angular association. Biometrika, 69, 315-21.
Fisher, N.I. & Lee, A.J. (1983). A correlation coefficient for circular data. Biometrika, 70, 327-32.
Fisher, N.I. & Lee, A.J. (1986). Correlation coefficients for random variables on a unit sphere or hypersphere. Biometrika, 73, 159-64.
Liddell, I.G. & Ord, J.K. (1978). Linear-circular correlation coefficients: some further results. Biometrika, 65, 448-50.
Mardia, K.V. (1976). Linear-circular correlation coefficients and rhythmometry. Biometrika, 63, 403-5.
See also
Procedures: CCOMPARE
, CDESCRIBE
, DCIRCULAR
, RCIRCULAR
, WINDROSE
.
Commands for: Basic and nonparametric statistics.
Example
CAPTION 'CASSOCIATION example 1',\ !t('Ozone concentration and wind direction',\ '(Fisher 1993, Statistical analysis of circular data,',\ 'Examples 6.4 & 6.5).');\ STYLE=meta,plain VARIATE [VALUES=28,85.2,80.5,4.7,45.9,12.7,72.5,56.6,31.5,112,\ 20,72.5,16,45.9,32.6,56.6,52.6,91.8,55.2] Ozone & [VALUES=327,91,88,305,344,270,67,21,281,8,\ 204,86,333,18,57,6,11,27,84] Wind CASSOCIATION Ozone; X=Wind; YTYPE=linear; SEED=!(361036,0) CAPTION 'CASSOCIATION example 2',\ !t('Nest orientations and creek directions',\ '(Fisher 1993, Statistical analysis of circular data,',\ 'Example6.6).');\ STYLE=meta,plain VARIATE [VALUES=240,230,250,30,215,215,135,110,240,105,\ 125,125,130,160,160,145,225,230,295,295,\ 140,140,140,205,215,135,110,105,90,130,\ 200,240,105,125,125,125,130,160,160,250,\ 200,200,240,240,240,250,250,250,140,140] Nest & [VALUES=105,75,80,105,110,75,90,100,100,80,\ 150,135,145,130,150,125,120,140,150,140,\ 135,150,120,135,135,130,150,150,120,150,\ 150,130,140,180,190,190,170,180,160,185,\ 170,180,200,190,195,180,165,170,200,175] Creek CASSOCIATION Nest; X=Creek; YTYPE=circular; SEED=!(578931,0)