Fits frequency distributions to accumulated counts (R.C. Butler, M.E. O’Neill, P. Brain & H. Turner).
Options
PRINT = string tokens |
Controls printed output (model , summary , estimates , correlations , fittedvalues , monitoring ); default mode , summ , esti |
---|---|
DISTRIBUTION = string token |
Which distribution to use (normal , logistic , complementaryloglog , acomplementaryloglog , inversenormal , weibull , exponential ); default norm |
TRANSFORMATION = string token |
Whether to use log(TIME ) if DISTRIBUTION = normal , logistic , complementarylog , or acomplementarylog (log , none ); default * uses log except when DISTRIBUTION = inversenormal , weibull or exponential |
LAG = string token |
Type of lag to add to TIME (none , positive , unconstrained ); default none |
ALLRESPOND = string token |
If TOTUNITS is set, whether all units are constrained to respond (yes , no ); default no |
FORM = string token |
Whether DATA are cumulated or differences (cumulated , differences ); default cumu |
LOSTUNITS = string token |
Whether data are left-censored (yes , no ); default no |
SEPARATE = string token |
Which parameters to estimate separately for each group (lag , b , m , propn , gamma ); default * |
POPSEPARATE = string token |
Which parameters to estimate separately for populations in each group (b , m , lag ); default * |
PLOT = string token |
Which graphs to draw (cumulative , density , trcumulative , trdensity ); default cumu |
MAXCYCLE = scalar |
Number of iterations for fitting, as in RCYCLE ; default 30 |
Parameters
DATA = variates or pointers |
Specifies the accumulated counts |
---|---|
TIME = variates or pointers |
Defines the time at which each count was recorded |
GROUPS = factors |
Factor indicating groups |
INITIAL = variates |
Initial values for all parameters |
IB = scalars or variates |
Initial values for b |
IM = scalars or variates |
Initial values for m |
ILAG = scalars or variates |
Initial values for lag |
IGAMMA = scalars or variates |
Initial values for gamma |
IPROPN = scalars or variates |
Initial values for proportions |
STEPLENGTHS = variates |
Steplengths for all parameters |
SB = scalars or variates |
Steplengths for b |
SM = scalars or variates |
Steplengths for m |
SLAG = scalars or variates |
Steplengths for lag |
SGAMMA = scalars or variates |
Steplengths for gamma |
SPROPN = scalars or variates |
Steplengths for proportions |
TOTUNITS = scalars or variates |
Total number |
NPOPULATION = scalars |
Number of populations (1, 2 or 3); default 1 |
SAVE = pointers |
Saves the results |
Description
CUMDISTRIBUTION
fits frequency distributions to a variate of counts, accumulated over time. The counts are specified by the DATA
parameter and the time (t) at which each count is supplied, in a variate, by the TIME
parameter. Counts may be accumulated over time (option FORM=cumulated
), or be the change in count from the previous time (FORM=difference
). Neither the DATA
or TIME
variate maybe restricted, nor must they contain any missing values. The DATA
values must all be non-negative integers.
The form of the cumulative density function is indicated by the DISTRIBUTION
option, which has the following settings (z is a function of TIME
as defined below).
DISTRIBUTION |
cumulative density function |
normal |
NORMAL (b × (z–m)) |
complementaryloglog |
EXP ( –EXP (-b × (z–m))) |
acomplementaryloglog |
1 – EXP (-EXP (b × (z–m))) |
logistic |
1 /(1 + EXP (-b × (z–m)) |
inversenormal |
NORMAL (SQRT (b/z) × (z/m – 1)) + EXP (2b/m) × (NORMAL (-SQRT (b/z) × (1+z/m)) – 1) |
weibull |
1 – EXP (-(m × z)** b) |
exponential |
1 – EXP (-m × z) |
The parameters b and m are estimated, and relate to the distribution of transformed time z as follows.
DISTRIBUTION |
Parameter b | Parameter m |
normal |
1 / sd | mean, t50 |
logistic |
2 × relative response rate at z=m | mean, t50 |
complementaryloglog |
relative response rate at z=m | mode |
acomplementaryloglog |
(e-1) × relative response rate at z=m | mode |
inversenormal |
(mean**3) / (sd**2) | mean |
weibull |
shape | scale |
exponential |
1/mean |
For some of the distributions, TIME
may be logged by setting option TRANSFORMATION=log
. A lag time before any units respond may be estimated by setting the option LAG=positive
. You can set LAG=unconstrained
to estimate a negative lag, which assumes that some units responded before TIME
=0. These options give z using the following functions of TIME
.
TRANSform=none |
TRANSFORM=log |
|
LAG=no |
z=TIME |
z=LOG (TIME ) |
LAG=positive or unconstrained |
z=TIME –LAG |
z=LOG (TIME –LAG ) |
The available combinations of LAG
and TRANSFORMATION
for the various distributions are shown below.
DISTRIBUTION |
TRANSFORM |
Equivalent distribution | Possible settings for LAG |
normal |
none |
none |
|
log |
log-normal | none , positive , unconstrained |
|
logistic |
none |
none |
|
log |
log-logistic | none , positive , unconstrained |
|
complementaryloglog |
none |
Gumbel, Extreme Value1 | none |
log |
Extreme value2 | none , positive , unconstrained |
|
acomplementaryloglog |
none |
none |
|
log |
Weibull | none , positive , unconstrained |
|
inversenormal |
none |
none , positive , unconstrained |
|
weibull |
none |
none , positive , unconstrained |
|
exponential |
none |
none , positive , unconstrained |
TRANSFORMATION
is set to log
by default for the first four distributions, and none
for the last three.
If the total number of units is known, it can be supplied by setting the TOTUNITS
parameter. By default, a parameter gamma, the proportion of TOTUNITS
that can respond, will be estimated. If option ALLRESPOND
is set to yes
, then gamma is fixed at 1 (indicating that all units will respond). If some units were lost before counting began, the number of these can be estimated by setting option LOSTUNITS=yes
.
Data for several groups can be fitted together, either by setting DATA
to a pointer of variates, or by setting the GROUPS
parameter to a factor to identify the different groups. If DATA
is set to a pointer, TIME
can be set to one variate if all the DATA
variates are the same length. Otherwise, it must be set to a pointer with a variate for each DATA
variate. Parameters for the groups are constrained to be equal by default, but any of the parameters b, m, lag and gamma can be estimated separately between groups by setting the SEPARATE
option.
The counts can be from a single population or from a mixture of up to 3 populations, as specified by the NPOPULATIONS
parameter (default 1). Parameters b, m and lag can be estimated separately between the populations by setting the POPSEPARATE
option. If this is set, the proportion (propn) of units in each population will also be estimated. If there are GROUPS
in the data, then the proportions can be estimated separately for each group by setting SEPARATE=propn
. NPOPULATIONS
is the same for each group.
Initial parameter values are estimated within the procedure, but can be supplied separately using any of the parameters IB
, IM
, ILAG
, IGAMMA
and IPROPN
, or in one list using the INITIAL
parameter. If any parameter is to be estimated separately between GROUPS
or populations, there must be one initial value for each parameter of that type to be estimated. For example, if there are two groups, and SEPARATE=m
, then IM
should be set to a variate of length 2. If INITIAL
is set, its values will be used even if the other initial value parameters are set. The values in INITIAL
must be in the order b, m, lag, gamma, propn, with enough values for the number of each being estimated. For propn, there must be 1 less than NPOPULATIONS
. For example, with 2 groups and 3 populations, with SEPARATE=b,m
and POPSEP=m
there will be 2 initial values for b and 6 for m with two for propn. Steplengths for the fitting process can be supplied similarly using STEPLENGTHS
or SB
, SM
, SLAG
, SGAMMA
, SPROPN
. MAXCYCLE
controls the maximum number of iterations, as in the RCYCLE
directive.
Output is controlled by the PRINT
option, with settings as in FITNONLINEAR
. Parameter estimates are indexed by groups and/or population numbers, with group labels first if both populations and groups are used. If PRINT=estimates
, parameters calculated from the fitted parameters (mean, sd, t50) are also printed. Option PLOT
determines the form of the graphical output:
cumulative |
fitted curve and cumulated counts, |
---|---|
density |
differenced fitted curve and counts, |
trcumulative |
trellis version of cumulative when there are GROUPS , |
trdensity |
trellis version of density when there are GROUPS . |
Setting PLOT=*
suppresses all graphs).
Some results can be saved using RKEEP
(as with FIT
). Further results can be saved by setting the SAVE
parameter. This creates a pointer with three sections labelled by their contents. SAVE['Data']
points to the columns used in the fitting process:
ndata |
the (differenced) counts, |
---|---|
ntime |
times for each count, |
groups |
grouping factor, |
fitted |
fitted values, |
cumdata |
cumulated counts, |
cumfitted |
cumulated fitted values, |
z |
transformed time variate (as above). |
SAVE['CalcParams']
contains the calculated parameters and their standard errors (Mean, Sd, T50, seMean, seSd, seT50). SAVE['Viable']
contains the estimated number of viable units (Nv) for each group and, if NPOP
>1, the number in each population (PopNv).
Options: PRINT
, DISTRIBUTION
, TRANSFORMATION
, LAG
, ALLRESPOND
, FORM
, LOSTUNITS
, SEPARATE
, POPSEPARATE
, PLOT
, MAXCYCLE
.
Parameters: DATA
, TIME
, GROUPS
, INITIAL
, IB
, IM
, ILAG
, IGAMMA
, IPROPN
, INITIAL
, IB
, IM
, ILAG
, IGAMMA
, IPROPN
, STEPLENGTHS
, SB
, SM
, SLAG
, SGAMMA
, SPROPN
, TOTUNITS
, NPOPULATION
, SAVE
.
Method
This procedure extends the methods described by Brain & Butler (1988). If FORM=cumulated
, the DATA
vector is differenced, and if DATA
is set to a pointer, the DATA
variates are stacked, and a factor created to identify the groups. The resulting data variate is then used with FITNONLINEAR
. The model to be fitted is set up in a pointer to expressions formed according to the settings of the various options and parameters.
Action with RESTRICT
Because the calculations in the procedure involve differencing the counts, the TIME
and DATA
variates must not be restricted.
Reference
Brain, P. & Butler, R.C. (1988). Cumulative count data. Genstat Newsletter, 22, 38-47.
See also
Directive: DISTRIBUTION
.
Procedure: RSURVIVAL
.
Commands for: Repeated measurements, Survival analysis.
Example
CAPTION 'CUMDISTRIBUTION example',\ !t('1) Data from Hunter, E.A., Glasbey, C.A., & Naylor, R.E.L.',\ '(1984). J. Agric. Sci. 102, 207-213.'); STYLE=meta,plain VARIATE Count,Time; VALUES=!(0,1,7,27,22,8,13,3,6,1,1,1,1),\ !(49,55,62,72,79,86,96,103,120,127,144,151,168) CUMDISTRIBUTION [PRINT=model,summary,estimates,fittedvalues;\ FORM=differences; DISTRIBUTION=normal; TRANSFORMATION=log;\ LAG=positive] DATA=Count;TIME=Time CAPTION '2) Randomly generated data from three groups' VARIATE [NVALUES=8] Time, Cum[1,2,3] READ Time,Cum[] 0 0 0 0 56 3 1 3 64 17 16 16 72 36 48 34 80 57 65 61 88 79 80 77 96 85 85 83 104 89 90 88 : CUMDISTRIBUTION [DISTRIBUTION=inversenormal; SEPARATE=b,m,l]\ DATA=Cum; TIME=Time CAPTION '3) Example fitting sub-populations and groups' VARIATE [NVALUES=15] Time; !(0,2,3,5,7,9,14,16,19,24,29,34,39,44,49) & Counts[1]; !(0,0,0,38,73,27,41,16,88,37,23,6,1,1,1) & Counts[2]; !(0,0,0,81,39,11,11,13,82,20,21,3,3,4,1) CAPTION '3a) All parameters varying between groups and populations' CUMDISTRIBUTION [SEPARATE=b,m,lag,gamma,propn; POPSEPARATE=b,m,lag;\ LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\ NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2) CAPTION '3b) Only some parameters varying between groups or populations' CUMDISTRIBUTION [SEPARATE=lag,gamma,propn; POPSEPARATE=m,lag;\ LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\ NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)