1. Home
  2. CUMDISTRIBUTION procedure

CUMDISTRIBUTION procedure

Fits frequency distributions to accumulated counts (R.C. Butler, M.E. O’Neill, P. Brain & H. Turner).

Options

PRINT = string tokens Controls printed output (model, summary, estimates, correlations, fittedvalues, monitoring); default mode, summ, esti
DISTRIBUTION = string token Which distribution to use (normal, logistic, complementaryloglog, acomplementaryloglog, inversenormal, weibull, exponential); default norm
TRANSFORMATION = string token Whether to use log(TIME) if DISTRIBUTION = normal, logistic, complementarylog, or acomplementarylog (log, none); default * uses log except when DISTRIBUTION = inversenormal, weibull or exponential
LAG = string token Type of lag to add to TIME (none, positive, unconstrained); default none
ALLRESPOND = string token If TOTUNITS is set, whether all units are constrained to respond (yes, no); default no
FORM = string token Whether DATA are cumulated or differences (cumulated, differences); default cumu
LOSTUNITS = string token Whether data are left-censored (yes, no); default no
SEPARATE = string token Which parameters to estimate separately for each group (lag, b, m, propn, gamma); default *
POPSEPARATE = string token Which parameters to estimate separately for populations in each group (b, m, lag); default *
PLOT = string token Which graphs to draw (cumulative, density, trcumulative, trdensity); default cumu
MAXCYCLE = scalar Number of iterations for fitting, as in RCYCLE; default 30

Parameters

DATA = variates or pointers Specifies the accumulated counts
TIME = variates or pointers Defines the time at which each count was recorded
GROUPS = factors Factor indicating groups
INITIAL = variates Initial values for all parameters
IB = scalars or variates Initial values for b
IM = scalars or variates Initial values for m
ILAG = scalars or variates Initial values for lag
IGAMMA = scalars or variates Initial values for gamma
IPROPN = scalars or variates Initial values for proportions
STEPLENGTHS = variates Steplengths for all parameters
SB = scalars or variates Steplengths for b
SM = scalars or variates Steplengths for m
SLAG = scalars or variates Steplengths for lag
SGAMMA = scalars or variates Steplengths for gamma
SPROPN = scalars or variates Steplengths for proportions
TOTUNITS = scalars or variates Total number
NPOPULATION = scalars Number of populations (1, 2 or 3); default 1
SAVE = pointers Saves the results

Description

CUMDISTRIBUTION fits frequency distributions to a variate of counts, accumulated over time. The counts are specified by the DATA parameter and the time (t) at which each count is supplied, in a variate, by the TIME parameter. Counts may be accumulated over time (option FORM=cumulated), or be the change in count from the previous time (FORM=difference). Neither the DATA or TIME variate maybe restricted, nor must they contain any missing values. The DATA values must all be non-negative integers.

The form of the cumulative density function is indicated by the DISTRIBUTION option, which has the following settings (z is a function of TIME as defined below).

DISTRIBUTION cumulative density function
normal NORMAL(b × (zm))
complementaryloglog EXP( –EXP(-b × (zm)))
acomplementaryloglog 1 – EXP(-EXP(b × (zm)))
logistic 1 /(1 + EXP(-b × (zm))
inversenormal NORMAL(SQRT(b/z) × (z/m – 1)) + EXP(2b/m) × (NORMAL(-SQRT(b/z) × (1+z/m)) – 1)
weibull 1 – EXP(-(m × z)**b)
exponential 1 – EXP(-m × z)

The parameters b and m are estimated, and relate to the distribution of transformed time z as follows.

DISTRIBUTION Parameter b Parameter m
normal 1 / sd mean, t50
logistic 2 × relative response rate at z=m mean, t50
complementaryloglog relative response rate at z=m mode
acomplementaryloglog (e-1) × relative response rate at z=m mode
inversenormal (mean**3) / (sd**2) mean
weibull shape scale
exponential   1/mean

For some of the distributions, TIME may be logged by setting option TRANSFORMATION=log. A lag time before any units respond may be estimated by setting the option LAG=positive. You can set LAG=unconstrained to estimate a negative lag, which assumes that some units responded before TIME=0. These options give z using the following functions of TIME.

  TRANSform=none TRANSFORM=log
LAG=no z=TIME z=LOG(TIME)
LAG=positive or unconstrained z=TIMELAG z=LOG(TIMELAG)

The available combinations of LAG and TRANSFORMATION for the various distributions are shown below.

DISTRIBUTION TRANSFORM Equivalent distribution Possible settings for LAG
normal none   none
  log log-normal none, positive, unconstrained
logistic none   none
  log log-logistic none, positive, unconstrained
complementaryloglog none Gumbel, Extreme Value1 none
  log Extreme value2 none, positive, unconstrained
acomplementaryloglog none   none
  log Weibull none, positive, unconstrained
inversenormal none   none, positive, unconstrained
weibull none   none, positive, unconstrained
exponential none   none, positive, unconstrained

TRANSFORMATION is set to log by default for the first four distributions, and none for the last three.

If the total number of units is known, it can be supplied by setting the TOTUNITS parameter. By default, a parameter gamma, the proportion of TOTUNITS that can respond, will be estimated. If option ALLRESPOND is set to yes, then gamma is fixed at 1 (indicating that all units will respond). If some units were lost before counting began, the number of these can be estimated by setting option LOSTUNITS=yes.

Data for several groups can be fitted together, either by setting DATA to a pointer of variates, or by setting the GROUPS parameter to a factor to identify the different groups. If DATA is set to a pointer, TIME can be set to one variate if all the DATA variates are the same length. Otherwise, it must be set to a pointer with a variate for each DATA variate. Parameters for the groups are constrained to be equal by default, but any of the parameters b, m, lag and gamma can be estimated separately between groups by setting the SEPARATE option.

The counts can be from a single population or from a mixture of up to 3 populations, as specified by the NPOPULATIONS parameter (default 1). Parameters b, m and lag can be estimated separately between the populations by setting the POPSEPARATE option. If this is set, the proportion (propn) of units in each population will also be estimated. If there are GROUPS in the data, then the proportions can be estimated separately for each group by setting SEPARATE=propn. NPOPULATIONS is the same for each group.

Initial parameter values are estimated within the procedure, but can be supplied separately using any of the parameters IB, IM, ILAG, IGAMMA and IPROPN, or in one list using the INITIAL parameter. If any parameter is to be estimated separately between GROUPS or populations, there must be one initial value for each parameter of that type to be estimated. For example, if there are two groups, and SEPARATE=m, then IM should be set to a variate of length 2. If INITIAL is set, its values will be used even if the other initial value parameters are set. The values in INITIAL must be in the order b, m, lag, gamma, propn, with enough values for the number of each being estimated. For propn, there must be 1 less than NPOPULATIONS. For example, with 2 groups and 3 populations, with SEPARATE=b,m and POPSEP=m there will be 2 initial values for b and 6 for m with two for propn. Steplengths for the fitting process can be supplied similarly using STEPLENGTHS or SB, SM, SLAG, SGAMMA, SPROPN. MAXCYCLE controls the maximum number of iterations, as in the RCYCLE directive.

Output is controlled by the PRINT option, with settings as in FITNONLINEAR. Parameter estimates are indexed by groups and/or population numbers, with group labels first if both populations and groups are used. If PRINT=estimates, parameters calculated from the fitted parameters (mean, sd, t50) are also printed. Option PLOT determines the form of the graphical output:

    cumulative fitted curve and cumulated counts,
    density differenced fitted curve and counts,
    trcumulative trellis version of cumulative when there are GROUPS,
    trdensity trellis version of density when there are GROUPS.

Setting PLOT=* suppresses all graphs).

Some results can be saved using RKEEP (as with FIT). Further results can be saved by setting the SAVE parameter. This creates a pointer with three sections labelled by their contents. SAVE['Data'] points to the columns used in the fitting process:

    ndata the (differenced) counts,
    ntime times for each count,
    groups grouping factor,
    fitted fitted values,
    cumdata cumulated counts,
    cumfitted cumulated fitted values,
    z transformed time variate (as above).

SAVE['CalcParams'] contains the calculated parameters and their standard errors (Mean, Sd, T50, seMean, seSd, seT50). SAVE['Viable'] contains the estimated number of viable units (Nv) for each group and, if NPOP>1, the number in each population (PopNv).

Options: PRINT, DISTRIBUTION, TRANSFORMATION, LAG, ALLRESPOND, FORM, LOSTUNITS, SEPARATE, POPSEPARATE, PLOT, MAXCYCLE.

Parameters: DATA, TIME, GROUPS, INITIAL, IB, IM, ILAG, IGAMMA, IPROPN, INITIAL, IB, IM, ILAG, IGAMMA, IPROPN, STEPLENGTHS, SB, SM, SLAG, SGAMMA, SPROPN, TOTUNITS, NPOPULATION, SAVE.

Method

This procedure extends the methods described by Brain & Butler (1988). If FORM=cumulated, the DATA vector is differenced, and if DATA is set to a pointer, the DATA variates are stacked, and a factor created to identify the groups. The resulting data variate is then used with FITNONLINEAR. The model to be fitted is set up in a pointer to expressions formed according to the settings of the various options and parameters.

Action with RESTRICT

Because the calculations in the procedure involve differencing the counts, the TIME and DATA variates must not be restricted.

Reference

Brain, P. & Butler, R.C. (1988). Cumulative count data. Genstat Newsletter, 22, 38-47.

See also

Directive: DISTRIBUTION.

Procedure: RSURVIVAL.

Commands for: Repeated measurements, Survival analysis.

Example

CAPTION 'CUMDISTRIBUTION example',\
        !t('1) Data from Hunter, E.A., Glasbey, C.A., & Naylor, R.E.L.',\
        '(1984). J. Agric. Sci. 102, 207-213.'); STYLE=meta,plain
VARIATE Count,Time; VALUES=!(0,1,7,27,22,8,13,3,6,1,1,1,1),\
        !(49,55,62,72,79,86,96,103,120,127,144,151,168)
CUMDISTRIBUTION [PRINT=model,summary,estimates,fittedvalues;\
        FORM=differences; DISTRIBUTION=normal; TRANSFORMATION=log;\ 
        LAG=positive] DATA=Count;TIME=Time
CAPTION '2) Randomly generated data from three groups'
VARIATE [NVALUES=8] Time, Cum[1,2,3]
READ Time,Cum[]
  0  0  0  0
 56  3  1  3
 64 17 16 16
 72 36 48 34
 80 57 65 61
 88 79 80 77
 96 85 85 83
104 89 90 88 :
CUMDISTRIBUTION [DISTRIBUTION=inversenormal; SEPARATE=b,m,l]\ 
        DATA=Cum; TIME=Time
CAPTION '3) Example fitting sub-populations and groups'
VARIATE [NVALUES=15] Time; !(0,2,3,5,7,9,14,16,19,24,29,34,39,44,49)
&       Counts[1]; !(0,0,0,38,73,27,41,16,88,37,23,6,1,1,1)
&       Counts[2]; !(0,0,0,81,39,11,11,13,82,20,21,3,3,4,1)
CAPTION '3a) All parameters varying between groups and populations'
CUMDISTRIBUTION [SEPARATE=b,m,lag,gamma,propn; POPSEPARATE=b,m,lag;\
        LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\
        NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)
CAPTION '3b) Only some parameters varying between groups or populations'
CUMDISTRIBUTION [SEPARATE=lag,gamma,propn; POPSEPARATE=m,lag;\
        LAG=positive; FORM=difference] DATA=Counts; TIME=Time;\
        NPOPULATION=2; TOTUNITS=400; ILAG=!((4,15)2)
Updated on March 8, 2019

Was this article helpful?