1. Home
  2. R0INFLATED procedure

R0INFLATED procedure

Fits zero-inflated regression models to count data with excess zeros (D.A. Murray).

Options

PRINT = string token Controls printed output (model, summary, estimates, fittedvalues, monitoring); default mode, summ, esti
DISTRIBUTION = string token Distribution of response variable (poisson, binomial, negativebinomial); default pois
METHOD = string token Method used for model fitting (em, conditional); default em
CONSTANT = string token How to treat constant for count state (estimate, omit); default esti
ZCONSTANT = string token How to treat constant for zero-inflation state (estimate, omit); default esti
XTERMS = formula List of explanatory variates and factors, or model formula for count state of model
ZTERMS = formula List of explanatory variates and factors, or model formula for zero-inflation state of model
WEIGHTS = variate Variate of weights for weighted zero-inflated regression (EM model only)
OFFSET = variate Offset variate to be used in the model (EM model only)
XGROUPS = factor Absorbing factor defining the groups for within-groups regression for the count state model (EM model only)
ZGROUPS = factor Absorbing factor defining the groups for within-groups regression for the zero-inflation state model (EM model only)
MAXCYCLE = scalar Maximum number of iterations for EM algorithm; default 100
TOLERANCE = scalar or variate Convergence criteria for EM algorithm, k and in the generalized linear models; default !(1.E-4, 1.E-4, 1.E-4)
ZPARAMETERIZATION = string token Parameterization of the probability of the zero-inflation model (zero, nonzero): if unset, zero is used for the EM model and nonzero for the conditional model

Parameters

Y = variates Response variate
NBINOMIAL = scalars or variates Total numbers for DISTRIBUTION=binomial
RESIDUALS = variates Saves the simple residuals
FITTEDVALUES = variates Saves the fitted values
ESTIMATES = variates Saves the estimates of the parameters
SE = variates Saves the standard errors of the estimates
RSAVE = identifiers Saves the regression structure for the final generalized model fitted for the count model
ZSAVE = identifiers Saves the regression structure for the final binomial regression fitted for the zero-inflation model

Description

R0INFLATED can be used to fit zero-inflated regression models to count data with excess zeros. The procedure allows the data to be modelled using two different approaches. The first possibility is to fit a zero-inflated Poisson regression model (ZIP), a zero-inflated binomial regression model (ZIB) or a zero-inflated negative binomial regression model (ZINB) using an EM algorithm (Lambert 1992). In this analysis, the response variable of counts is assumed to be distributed as a mixture of a distribution (such as Poisson) and a degenerate distribution at zero. In these models, a generalized linear model with a Poisson or negative binomial distribution and log link, or with a binomial distribution and logit link, is used for the count model. A generalized linear model with a binomial distribution and logit link is used for the zero-inflation model.

The alternative is to fit the conditional model of Welsh et al. (1996), which assumes that the data are in one of two states: a state where zeros are observed, or a state where counts are recorded. A binomial model with a logit link is used for the zero state. A truncated Poisson, truncated binomial or truncated negative binomial model is used for the count state.

The response variable is supplied, in a variate, using the Y parameter. The NBINOMIAL parameter must also be set when DISTRIBUTION=binomial, to give the number of binomial trials for each unit. The XTERMS and ZTERMS options each specifies a formula, to describe the count model and the zero-inflation model respectively. The CONSTANT and ZCONSTANT options control whether a constant parameter is included in the count and zero-inflation models.

The METHOD option specifies the type of model to fit: the em setting fits the ZIP, ZIB and ZINB mixture models, and the conditional setting fits the conditional model. The DISTRIBUTION option specifies the distribution for the count model. Note that a log link is always used for the count model with the Poisson and negative binomial distributions, and a logit link is used with the binomial distribution.

The XGROUPS and ZGROUPS options can specify factors whose effects you want to eliminate from the count or zero-inflation state respectively, before any regression is fitted. This method of elimination is sometimes called absorption. (See the GROUPS option of the MODEL directive.) It gives less information than you would get if you included the factor explicitly in the model. For example, no standard errors are produced. However, it saves space and time when data from many different groups are to be modelled. These options are only available for the EM model.

The ESTIMATES and SE parameters save the parameter estimates and their standard errors. R0INFLATED puts them into variates, using the same order as in the display produced by the PRINT option. The simple residuals and the fitted values can be saved using the RESIDUALS and FITTEDVALUES parameters.

The RSAVE and ZSAVE parameters allow you to specify identifiers for the regression save structures for the count and zero-inflation states of the model. These structures store the final state of the regression models fitted. Note that the standard errors for the parameter estimates in the regression save structures will not be correct and should instead be obtained using the SE parameter or by the R0KEEP procedure.

For the mixture models, the WEIGHTS option can specify a variate holding weights for each unit, and the OFFSET option allows you to include an offset (i.e. a variable in the regression model with a regression coefficient fixed at one).

The PRINT option controls printed output, with settings:

    model gives a description of the model, including response and explanatory variates for count and zero-inflation models;
    summary displays minus twice log-likelihood, the Akaike information coefficient (AIC) and the Schwarz (Bayesian) information coefficient (BIC or SIC);
    estimates gives the estimates of the parameters in the model with standard errors based on the asymptotic variance-covariance matrix derived from the inverse of the observed Fisher information matrix;
    fittedvalues displays a table of unit labels, values of response variate, fitted values and residuals;
    monitoring displays monitoring information of the iterative algorithm.

The iterative process for the EM algorithm is controlled by the MAXCYCLE option which defines the maximum number of cycles, and the TOLERANCE option which sets convergence criteria. The EM algorithm cycle stops when successive values of the log-likelihood are within a tolerance set by the first element of the TOLERANCE option. The second and third elements of TOLERANCE control the convergence criterion for the aggregation parameter (k) for the negative binomial model and for the generalized linear model, respectively.

The ZPARAMETERIZATION option controls how the probability for the zero-inflation model is specified. Note that the parameters in the model specification for the mixture and conditional models have different interpretations. In the mixture model the default setting is zero, which parameterizes the model such that ω is the probability of the excess zeros. Alternatively, you can set ZPARAMETERIZATION=nonzero, to parameterize the model such that ω is the probability that an observation is generated through the distribution. In the conditional model the default setting is nonzero, which parameterizes the model such that ω = 1 – p(x) where p(x) is the probability of detecting at least one observation, given that there is at least one observation. Alternatively, if you set ZPARAMETERIZATION=zero, the parameterization is that ω = p(x). For further details, see the Method section.

Options: PRINT, DISTRIBUTION, METHOD, CONSTANT, ZCONSTANT, XTERMS, ZTERMS, WEIGHTS, OFFSET, XGROUPS, ZGROUPS, MAXCYCLE, TOLERANCE, ZPARAMETERIZATION.

Parameters: Y, ,NBINOMIAL, RESIDUALS, FITTEDVALUES, ESTIMATES, SE, RSAVE, ZSAVE.

Method

The zero-inflated Poisson (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × exp(-λ) for y=0
  = (1 – ω) × exp(-λ) × λy / y! for y>0

where λ and ω are given by the following models

log(λ) = X β

log(ω/(1-ω)) = Z α

where X and Z are covariate matrices and β and α are vectors of unknown parameters.    The zero-inflated binomial (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × (1-p)n for y=0
  = (1 – ω) × py × (1 – p)ny × n! / (y! × (ny!)) for y>0

where p and ω are given by the following models

log(p/(1-p)) = X β

log(ω/(1-ω)) = Z α

The zero-inflated negative binomial (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × (1 + λ × k)-(1/k) for y=0
  = (1 – ω) × Γ(y + 1/k) / (y! × Γ(1/k))
  × (1 + λ × k)-(y + 1/k) for y>0

where λ and ω are given by the same models as for the Poisson distribution, and k is the extra-variation parameter in the negative binomial distribution.

The maximum likelihood estimates for β, α and k are obtained using an EM algorithm (Lambert 1992). The standard errors for the parameter estimates are derived using the incomplete data observed information matrix as proposed by Lambert (1992). The default parameterization for the mixture models estimates ω, the probability of excess zeros. You can use the ZPARAMETERIZATION option to change the parameterization to estimate ω′, the probability that an observation is generated through the distribution instead (ω′ = 1-ω).

In the Poisson case of the conditional model, y has a truncated Poisson distribution (λ). So the probability model is

    Pr(Y=y) = ω for y=0
  = (1 – ω) × exp(-λ) × λy) / { y! × (1 – exp(-λ) } for y>0

where λ and ω are given by the following models

log(λ) = X β

log(ω/(1-ω)) = Z α

In the truncated binomial case, y has a truncated binomial distribution. So the probability model is

    Pr(Y=y) = ω for y=0
  = (1 – ω) × py × (1 – p)ny / (1 – (1 – p)n)
  × n! / (y! × (ny!)) for y>0

where p and ω are given by the following models

log(p/(1-p)) = X β

log(ω/(1-ω)) = Z α

In the negative binomial case, y has a truncated negative binomial (λ, k). So the probability model is

    Pr(Y=y) = ω for y=0
  = (1 – ω) × Γ(y + 1/k) / (y! × Γ(1/k))
  × (1 + k × λ)-(y + 1/k)
  × (1 – (1 + k × λ)-1/k)-1, for y>0

where λ and ω are given by the same models as for the Poisson distribution, and k is the extra-variation parameter in the negative binomial distribution.

The truncated Poisson model is fitted using an iteratively re-weighted least squares algorithm (see Welsh et al. 1996). The truncated binomial and negative binomial models are fitted using FITNONLINEAR.. The default parameterization for the mixture models estimates ω′ (=1-ω), the probability of detecting at least one observation given that there is at least one observation, as in Welsh et al. (1996). You can use the ZPARAMETERIZATION option to change the parameterization to estimate ω, the probability of detecting a zero observation, instead.

Action with RESTRICT

If a parameter is restricted the statistics will be calculated using only those units included in the restriction.

References

Hall, D,B. (2000). Zero-inflated Poisson and Binomial regression with random effects: a case study. Biometrics, 56, 1030-1039.

Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.

Ridout, M., Demetrio, C.G.B. & Hinde, J. (1998). Models for count data with many zeros. International Biometrics Conference, Cape Town.

Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. (1996). Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling, 88, 297-308.

See also

Procedures: RNEGBINOMIAL, R0KEEP.

Commands for: Regression analysis.

Example

CAPTION    'R0INFLATED example - EM algorithm',\
           'Apple shoot data',\
           !t('Ridout et al. (1998)',\
              'Models for count data with many zeros,',\
              'IBC Cape Town 1998.'); STYLE=meta,minor,plain
FACTOR     [LABELS=!T('0.5','1','2','4'); VALUES=30(1,2),\ 
           40(3,4),30(1,2,3),40(4)] Hormone
FACTOR     [LABELS=!T('8','16'); VALUES=140(1),130(2)] Period
READ       NShoots
1 1 1 2 2 3 3 3 4 4 4 4 4 4 5 5 5 6 6 7 7 8 8 8 9 10 10 11 13 17
2 2 2 4 6 6 6 7 7 7 7 7 7 7 8 8 8 9 9 9 9 9 10 10 10 11 11 11 11 13
2 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8
8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 11 12 12 14 14
0 0 3 3 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 
8 8 8 8 8 8 8 8 9 9 9 10 10 10 10 11 11 11 11 14
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 3 3 4 5 5 6 8 9 9 9 10 11 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 3 4 4 5 6 6 8 10 10 10 12
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 4 4 5 5 6 6 6 7 9 9 11 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 
1 2 2 3 3 3 3 4 4 4 5 6 6 6 6 7 7 7 9 9 :
R0INFLATED [PRINT=mod,est,sum; CONSTANT=estimate; XTERMS=Hormone*Period]\ 
           NShoots
R0INFLATED [PRINT=mod,est,sum; CONSTANT=estimate; XTERMS=Hormone*Period;\ 
           ZCONSTANT=estimate; ZTERMS=Period] NShoots
R0INFLATED [PRINT=mod,est,sum; DISTRIBUTION=negativebinomial;\ 
           XTERMS=Hormone*Period; ZTERMS=Period] NShoots
R0INFLATED [PRINT=mod,est,sum; DISTRIBUTION=negativebinomial;\ 
           XTERMS=Period; ZTERMS=Period] NShoots

CAPTION    'R0INFLATED example - Conditional Model',\
           'Leadbeater''s Possum data,',\
           !t('Welsh et al. (1996) Modelling the abundance of rare species:',\
              'statistical models  for counts with extra zeros.',\
              'Ecological Modelling.'); STYLE=meta,minor,plain
VARIATE    [NVALUES=151] no_lb,stags
READ       no_lb
  7  0  0  3  2 10  7  3  0  0  0  0  0  2  0  1  0  4  3  2 10  7  0  3  7  0
  0  0  0  0  5  9  0  0  0  0  1  0  5  4  0  0  4  0  4  0  2  0  0  1  1  0
  3  0  0  0  0  0  2  0  0  1  0  2  5  3  0  0  0  0  0  0  0  0  5  0  0  0
  0  0  0  1  5  4  0  0  0  0  3  0  3  3  1  0  0  0  0  0  2  0  0  1  0  3
  0  0  4  0  0  3  4  0  8  5  3  0  0  0  5  5  0  2  0  0  0  0  0  2  0  2
  0  0  0  0  0  4  0  0  0  0  5  0  0  0  0  0  1  0  0  0  0 :
READ       stags
 12 15  6 14 16 16  9 20  7  4  6  5  4  6  4 10  6 11 11  4 16  8 10  9  7 10
 15  5  7 10 11  8  8  3 14  5  8 14 11  2  1  1  7  2  7  7  1  6  8  6  6  5
  6  0  0  2  0  1  3  2  2  6  3  4  3  4  5  2  3  4  4  2  2 10 16 10  4  3
  2  2  2  2  3  1  6  8  2  4 12 13  3 14  2  4  0  2  3 14 29  2  4  6  3  8
  4  7 20  4 11  5  1  2 27 24  9 18  3 20 25  4  4 30 24  8  4  6  5  3  5  2
  3  5  7  4  5  4  4  1  4 23 25 31  0  8  4  4  1  3  1  1  4 :
CALCULATE  lstags = log(stags+1)
R0INFLATED [PRINT=mod,sum,est; METHOD=conditional; DIST=negative;\ 
           ZTERMS=lstags; XTERMS=lstags] no_lb
R0INFLATED [PRINT=mod,sum,est; METHOD=conditional;\ 
           ZTERMS=lstags; XTERMS=lstags] no_lb
Updated on June 18, 2019

Was this article helpful?