CUMDISTRIBUTION procedure

Fits frequency distributions to accumulated counts (R.C. Butler, M.E. O’Neill, P. Brain & H. Turner).

Options

`PRINT` = string tokens	Controls printed output (`model`, `summary`, `estimates`, `correlations`, `fittedvalues`, `monitoring`); default `mode`, `summ`, `esti`
`DISTRIBUTION` = string token	Which distribution to use (`normal`, `logistic`, `complementaryloglog`, `acomplementaryloglog`, `inversenormal`, `weibull`, `exponential`); default `norm`
`TRANSFORMATION` = string token	Whether to use log(`TIME`) if `DISTRIBUTION` = `normal`, `logistic`, `complementarylog`, or `acomplementarylog` (`log`, `none`); default `*` uses `log` except when `DISTRIBUTION` = `inversenormal`, `weibull` or `exponential`
`LAG` = string token	Type of lag to add to `TIME` (`none`, `positive`, `unconstrained`); default `none`
`ALLRESPOND` = string token	If `TOTUNITS` is set, whether all units are constrained to respond (`yes`, `no`); default `no`
`FORM` = string token	Whether `DATA` are cumulated or differences (`cumulated`, `differences`); default `cumu`
`LOSTUNITS` = string token	Whether data are left-censored (`yes`, `no`); default `no`
`SEPARATE` = string token	Which parameters to estimate separately for each group (`lag`, `b`, `m`, `propn`, `gamma`); default `*`
`POPSEPARATE` = string token	Which parameters to estimate separately for populations in each group `(b`, `m`, `lag`); default `*`
`PLOT` = string token	Which graphs to draw (`cumulative`, `density`, `trcumulative`, `trdensity`); default `cumu`
`MAXCYCLE` = scalar	Number of iterations for fitting, as in `RCYCLE`; default 30

Parameters

`DATA` = variates or pointers	Specifies the accumulated counts
`TIME` = variates or pointers	Defines the time at which each count was recorded
`GROUPS` = factors	Factor indicating groups
`INITIAL` = variates	Initial values for all parameters
`IB` = scalars or variates	Initial values for b
`IM` = scalars or variates	Initial values for m
`ILAG` = scalars or variates	Initial values for lag
`IGAMMA` = scalars or variates	Initial values for gamma
`IPROPN` = scalars or variates	Initial values for proportions
`STEPLENGTHS` = variates	Steplengths for all parameters
`SB` = scalars or variates	Steplengths for b
`SM` = scalars or variates	Steplengths for m
`SLAG` = scalars or variates	Steplengths for lag
`SGAMMA` = scalars or variates	Steplengths for gamma
`SPROPN` = scalars or variates	Steplengths for proportions
`TOTUNITS` = scalars or variates	Total number
`NPOPULATION` = scalars	Number of populations (1, 2 or 3); default 1
`SAVE` = pointers	Saves the results

Description

CUMDISTRIBUTION fits frequency distributions to a variate of counts, accumulated over time. The counts are specified by the DATA parameter and the time (t) at which each count is supplied, in a variate, by the TIME parameter. Counts may be accumulated over time (option FORM=cumulated), or be the change in count from the previous time (FORM=difference). Neither the DATA or TIME variate maybe restricted, nor must they contain any missing values. The DATA values must all be non-negative integers.

The form of the cumulative density function is indicated by the DISTRIBUTION option, which has the following settings (z is a function of TIME as defined below).

`DISTRIBUTION`	cumulative density function
`normal`	`NORMAL`(b × (z–m))
`complementaryloglog`	`EXP`( –`EXP`(-b × (z–m)))
`acomplementaryloglog`	1 – `EXP`(-`EXP`(b × (z–m)))
`logistic`	1 /(1 + `EXP`(-b × (z–m))
`inversenormal`	`NORMAL`(`SQRT`(b/z) × (z/m – 1)) + `EXP`(2b/m) × (`NORMAL`(-`SQRT`(b/z) × (1+z/m)) – 1)
`weibull`	1 – `EXP`(-(m × z)`**`b)
`exponential`	1 – `EXP`(-m × z)

The parameters b and m are estimated, and relate to the distribution of transformed time z as follows.

`DISTRIBUTION`	Parameter b	Parameter m
`normal`	1 / sd	mean, t50
`logistic`	2 × relative response rate at z=m	mean, t50
`complementaryloglog`	relative response rate at z=m	mode
`acomplementaryloglog`	(e-1) × relative response rate at z=m	mode
`inversenormal`	(mean3) / (sd2)	mean
`weibull`	shape	scale
`exponential`		1/mean

For some of the distributions, TIME may be logged by setting option TRANSFORMATION=log. A lag time before any units respond may be estimated by setting the option LAG=positive. You can set LAG=unconstrained to estimate a negative lag, which assumes that some units responded before TIME=0. These options give z using the following functions of TIME.

	`TRANSform=none`	`TRANSFORM=log`
`LAG=no`	z=`TIME`	z=`LOG`(`TIME`)
`LAG=positive` or `unconstrained`	z=`TIME`–`LAG`	z=`LOG`(`TIME`–`LAG`)

The available combinations of LAG and TRANSFORMATION for the various distributions are shown below.

`DISTRIBUTION`	`TRANSFORM`	Equivalent distribution	Possible settings for `LAG`
`normal`	`none`		`none`
	`log`	log-normal	`none`, `positive`, `unconstrained`
`logistic`	`none`		`none`
	`log`	log-logistic	`none`, `positive`, `unconstrained`
`complementaryloglog`	`none`	Gumbel, Extreme Value1	`none`
	`log`	Extreme value2	`none`, `positive`, `unconstrained`
`acomplementaryloglog`	`none`		`none`
	`log`	Weibull	`none`, `positive`, `unconstrained`
`inversenormal`	`none`		`none`, `positive`, `unconstrained`
`weibull`	`none`		`none`, `positive`, `unconstrained`
`exponential`	`none`		`none`, `positive`, `unconstrained`

TRANSFORMATION is set to log by default for the first four distributions, and none for the last three.

If the total number of units is known, it can be supplied by setting the TOTUNITS parameter. By default, a parameter gamma, the proportion of TOTUNITS that can respond, will be estimated. If option ALLRESPOND is set to yes, then gamma is fixed at 1 (indicating that all units will respond). If some units were lost before counting began, the number of these can be estimated by setting option LOSTUNITS=yes.

Data for several groups can be fitted together, either by setting DATA to a pointer of variates, or by setting the GROUPS parameter to a factor to identify the different groups. If DATA is set to a pointer, TIME can be set to one variate if all the DATA variates are the same length. Otherwise, it must be set to a pointer with a variate for each DATA variate. Parameters for the groups are constrained to be equal by default, but any of the parameters b, m, lag and gamma can be estimated separately between groups by setting the SEPARATE option.

The counts can be from a single population or from a mixture of up to 3 populations, as specified by the NPOPULATIONS parameter (default 1). Parameters b, m and lag can be estimated separately between the populations by setting the POPSEPARATE option. If this is set, the proportion (propn) of units in each population will also be estimated. If there are GROUPS in the data, then the proportions can be estimated separately for each group by setting SEPARATE=propn. NPOPULATIONS is the same for each group.

Initial parameter values are estimated within the procedure, but can be supplied separately using any of the parameters IB, IM, ILAG, IGAMMA and IPROPN, or in one list using the INITIAL parameter. If any parameter is to be estimated separately between GROUPS or populations, there must be one initial value for each parameter of that type to be estimated. For example, if there are two groups, and SEPARATE=m, then IM should be set to a variate of length 2. If INITIAL is set, its values will be used even if the other initial value parameters are set. The values in INITIAL must be in the order b, m, lag, gamma, propn, with enough values for the number of each being estimated. For propn, there must be 1 less than NPOPULATIONS. For example, with 2 groups and 3 populations, with SEPARATE=b,m and POPSEP=m there will be 2 initial values for b and 6 for m with two for propn. Steplengths for the fitting process can be supplied similarly using STEPLENGTHS or SB, SM, SLAG, SGAMMA, SPROPN. MAXCYCLE controls the maximum number of iterations, as in the RCYCLE directive.

Output is controlled by the PRINT option, with settings as in FITNONLINEAR. Parameter estimates are indexed by groups and/or population numbers, with group labels first if both populations and groups are used. If PRINT=estimates, parameters calculated from the fitted parameters (mean, sd, t50) are also printed. Option PLOT determines the form of the graphical output:

`cumulative`	fitted curve and cumulated counts,
`density`	differenced fitted curve and counts,
`trcumulative`	trellis version of cumulative when there are `GROUPS`,
`trdensity`	trellis version of density when there are `GROUPS`.

Setting PLOT=* suppresses all graphs).

Some results can be saved using RKEEP (as with FIT). Further results can be saved by setting the SAVE parameter. This creates a pointer with three sections labelled by their contents. SAVE['Data'] points to the columns used in the fitting process:

`ndata`	the (differenced) counts,
`ntime`	times for each count,
`groups`	grouping factor,
`fitted`	fitted values,
`cumdata`	cumulated counts,
`cumfitted`	cumulated fitted values,
`z`	transformed time variate (as above).

SAVE['CalcParams'] contains the calculated parameters and their standard errors (Mean, Sd, T50, seMean, seSd, seT50). SAVE['Viable'] contains the estimated number of viable units (Nv) for each group and, if NPOP>1, the number in each population (PopNv).

Options: PRINT, DISTRIBUTION, TRANSFORMATION, LAG, ALLRESPOND, FORM, LOSTUNITS, SEPARATE, POPSEPARATE, PLOT, MAXCYCLE.

Parameters: DATA, TIME, GROUPS, INITIAL, IB, IM, ILAG, IGAMMA, IPROPN, INITIAL, IB, IM, ILAG, IGAMMA, IPROPN, STEPLENGTHS, SB, SM, SLAG, SGAMMA, SPROPN, TOTUNITS, NPOPULATION, SAVE.

Method

This procedure extends the methods described by Brain & Butler (1988). If FORM=cumulated, the DATA vector is differenced, and if DATA is set to a pointer, the DATA variates are stacked, and a factor created to identify the groups. The resulting data variate is then used with FITNONLINEAR. The model to be fitted is set up in a pointer to expressions formed according to the settings of the various options and parameters.

Action with `RESTRICT`

Because the calculations in the procedure involve differencing the counts, the TIME and DATA variates must not be restricted.

Reference

Brain, P. & Butler, R.C. (1988). Cumulative count data. Genstat Newsletter, 22, 38-47.

Example

CAPTION 'CUMDISTRIBUTION example',\
        !t('1) Data from Hunter, E.A., Glasbey, C.A., & Naylor, R.E.L.',\
        '(1984). J. Agric. Sci. 102, 207-213.'); STYLE=meta,plain
VARIATE Count,Time; VALUES=!(0,1,7,27,22,8,13,3,6,1,1,1,1),\
        !(49,55,62,72,79,86,96,103,120,127,144,151,168)
CUMDISTRIBUTION [PRINT=model,summary,estimates,fittedvalues;\
        FORM=differences; DISTRIBUTION=normal; TRANSFORMATION=log;\ 
        LAG=positive] DATA=Count;TIME=Time
CAPTION '2) Randomly generated data from three groups'
VARIATE [NVALUES=8] Time, Cum[1,2,3]
READ Time,Cum[]
  0  0  0  0
 56  3  1  3
 64 17 16 16
 72 36 48 34
 80 57 65 61
 88 79 80 77
 96 85 85 83
104 89 90 88 :
CUMDISTRIBUTION [DISTRIBUTION=inversenormal; SEPARATE=b,m,l]\ 
        DATA=Cum; TIME=Time
CAPTION '3) Example fitting sub-populations and groups'
VARIATE [NVALUES=15] Time; !(0,2,3,5,7,9,14,16,19,24,29,34,39,44,49)
&       Counts[1]; !(0,0,0,38,73,27,41,16,88,37,23,6,1,1,1)
&       Counts[2]; !(0,0,0,81,39,11,11,13,82,20,21,3,3,4,1)
CAPTION '3a) All parameters varying between groups and populations'
CUMDISTRIBUTION [SEPARATE=b,m,lag,gamma,propn; POPSEPARATE=b,m,lag;\
        LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\
        NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)
CAPTION '3b) Only some parameters varying between groups or populations'
CUMDISTRIBUTION [SEPARATE=lag,gamma,propn; POPSEPARATE=m,lag;\
        LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\
        NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)

Updated on March 8, 2019

Was this article helpful?

Yes No