1. Home
2. R0INFLATED procedure

R0INFLATED procedure

Fits zero-inflated regression models to count data with excess zeros (D.A. Murray).

Options

`PRINT` = string token Controls printed output (`model`, `summary`, `estimates`, `fittedvalues`, `monitoring`); default `mode`, `summ`, `esti` Distribution of response variable (`poisson`, `binomial`, `negativebinomial`); default `pois` Method used for model fitting (`em`, `conditional`); default `em` How to treat constant for count state (`estimate`, `omit`); default `esti` How to treat constant for zero-inflation state (`estimate`, `omit`); default `esti` List of explanatory variates and factors, or model formula for count state of model List of explanatory variates and factors, or model formula for zero-inflation state of model Variate of weights for weighted zero-inflated regression (EM model only) Offset variate to be used in the model (EM model only) Absorbing factor defining the groups for within-groups regression for the count state model (EM model only) Absorbing factor defining the groups for within-groups regression for the zero-inflation state model (EM model only) Maximum number of iterations for EM algorithm; default 100 Convergence criteria for EM algorithm, k and in the generalized linear models; default `!(1.E-4, 1.E-4, 1.E-4)` Parameterization of the probability of the zero-inflation model (`zero`, `nonzero`): if unset, `zero` is used for the EM model and `nonzero` for the conditional model

Parameters

`Y` = variates Response variate Total numbers for `DISTRIBUTION=binomial` Saves the simple residuals Saves the fitted values Saves the estimates of the parameters Saves the standard errors of the estimates Saves the regression structure for the final generalized model fitted for the count model Saves the regression structure for the final binomial regression fitted for the zero-inflation model

Description

`R0INFLATED` can be used to fit zero-inflated regression models to count data with excess zeros. The procedure allows the data to be modelled using two different approaches. The first possibility is to fit a zero-inflated Poisson regression model (ZIP), a zero-inflated binomial regression model (ZIB) or a zero-inflated negative binomial regression model (ZINB) using an EM algorithm (Lambert 1992). In this analysis, the response variable of counts is assumed to be distributed as a mixture of a distribution (such as Poisson) and a degenerate distribution at zero. In these models, a generalized linear model with a Poisson or negative binomial distribution and log link, or with a binomial distribution and logit link, is used for the count model. A generalized linear model with a binomial distribution and logit link is used for the zero-inflation model.

The alternative is to fit the conditional model of Welsh et al. (1996), which assumes that the data are in one of two states: a state where zeros are observed, or a state where counts are recorded. A binomial model with a logit link is used for the zero state. A truncated Poisson, truncated binomial or truncated negative binomial model is used for the count state.

The response variable is supplied, in a variate, using the `Y` parameter. The `NBINOMIAL` parameter must also be set when `DISTRIBUTION=binomial`, to give the number of binomial trials for each unit. The `XTERMS` and `ZTERMS` options each specifies a formula, to describe the count model and the zero-inflation model respectively. The `CONSTANT` and `ZCONSTANT` options control whether a constant parameter is included in the count and zero-inflation models.

The `METHOD` option specifies the type of model to fit: the `em` setting fits the ZIP, ZIB and ZINB mixture models, and the `conditional` setting fits the conditional model. The `DISTRIBUTION` option specifies the distribution for the count model. Note that a log link is always used for the count model with the Poisson and negative binomial distributions, and a logit link is used with the binomial distribution.

The `XGROUPS` and `ZGROUPS` options can specify factors whose effects you want to eliminate from the count or zero-inflation state respectively, before any regression is fitted. This method of elimination is sometimes called absorption. (See the `GROUPS` option of the `MODEL` directive.) It gives less information than you would get if you included the factor explicitly in the model. For example, no standard errors are produced. However, it saves space and time when data from many different groups are to be modelled. These options are only available for the EM model.

The `ESTIMATES` and `SE` parameters save the parameter estimates and their standard errors. `R0INFLATED` puts them into variates, using the same order as in the display produced by the `PRINT` option. The simple residuals and the fitted values can be saved using the `RESIDUALS` and `FITTEDVALUES` parameters.

The `RSAVE` and `ZSAVE` parameters allow you to specify identifiers for the regression save structures for the count and zero-inflation states of the model. These structures store the final state of the regression models fitted. Note that the standard errors for the parameter estimates in the regression save structures will not be correct and should instead be obtained using the `SE` parameter or by the `R0KEEP` procedure.

For the mixture models, the `WEIGHTS` option can specify a variate holding weights for each unit, and the `OFFSET` option allows you to include an offset (i.e. a variable in the regression model with a regression coefficient fixed at one).

The `PRINT` option controls printed output, with settings:

    `model` gives a description of the model, including response and explanatory variates for count and zero-inflation models; displays minus twice log-likelihood, the Akaike information coefficient (AIC) and the Schwarz (Bayesian) information coefficient (BIC or SIC); gives the estimates of the parameters in the model with standard errors based on the asymptotic variance-covariance matrix derived from the inverse of the observed Fisher information matrix; displays a table of unit labels, values of response variate, fitted values and residuals; displays monitoring information of the iterative algorithm.

The iterative process for the EM algorithm is controlled by the `MAXCYCLE` option which defines the maximum number of cycles, and the `TOLERANCE` option which sets convergence criteria. The EM algorithm cycle stops when successive values of the log-likelihood are within a tolerance set by the first element of the `TOLERANCE` option. The second and third elements of `TOLERANCE` control the convergence criterion for the aggregation parameter (k) for the negative binomial model and for the generalized linear model, respectively.

The `ZPARAMETERIZATION` option controls how the probability for the zero-inflation model is specified. Note that the parameters in the model specification for the mixture and conditional models have different interpretations. In the mixture model the default setting is `zero`, which parameterizes the model such that ω is the probability of the excess zeros. Alternatively, you can set `ZPARAMETERIZATION=nonzero`, to parameterize the model such that ω is the probability that an observation is generated through the distribution. In the conditional model the default setting is `nonzero`, which parameterizes the model such that ω = 1 – p(x) where p(x) is the probability of detecting at least one observation, given that there is at least one observation. Alternatively, if you set `ZPARAMETERIZATION=zero`, the parameterization is that ω = p(x). For further details, see the Method section.

Options: `PRINT`, `DISTRIBUTION`, `METHOD`, `CONSTANT`, `ZCONSTANT`, `XTERMS`, `ZTERMS`, `WEIGHTS`, `OFFSET`, `XGROUPS`, `ZGROUPS`, `MAXCYCLE`, `TOLERANCE`, `ZPARAMETERIZATION`.

Parameters: `Y`, ,`NBINOMIAL`, `RESIDUALS`, `FITTEDVALUES`, `ESTIMATES`, `SE`, `RSAVE`, `ZSAVE`.

Method

The zero-inflated Poisson (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × exp(-λ) for y=0 = (1 – ω) × exp(-λ) × λy / y! for y>0

where λ and ω are given by the following models

log(λ) = X β

log(ω/(1-ω)) = Z α

where X and Z are covariate matrices and β and α are vectors of unknown parameters.    The zero-inflated binomial (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × (1-p)n for y=0 = (1 – ω) × py × (1 – p)n–y × n! / (y! × (n–y!)) for y>0

where p and ω are given by the following models

log(p/(1-p)) = X β

log(ω/(1-ω)) = Z α

The zero-inflated negative binomial (mixture) regression model has the distribution

    Pr(Y=y) = ω + (1 – ω) × (1 + λ × k)-(1/k) for y=0 = (1 – ω) × Γ(y + 1/k) / (y! × Γ(1/k)) × (1 + λ × k)-(y + 1/k) for y>0

where λ and ω are given by the same models as for the Poisson distribution, and k is the extra-variation parameter in the negative binomial distribution.

The maximum likelihood estimates for β, α and k are obtained using an EM algorithm (Lambert 1992). The standard errors for the parameter estimates are derived using the incomplete data observed information matrix as proposed by Lambert (1992). The default parameterization for the mixture models estimates ω, the probability of excess zeros. You can use the `ZPARAMETERIZATION` option to change the parameterization to estimate ω′, the probability that an observation is generated through the distribution instead (ω′ = 1-ω).

In the Poisson case of the conditional model, y has a truncated Poisson distribution (λ). So the probability model is

    Pr(Y=y) = ω for y=0 = (1 – ω) × exp(-λ) × λy) / { y! × (1 – exp(-λ) } for y>0

where λ and ω are given by the following models

log(λ) = X β

log(ω/(1-ω)) = Z α

In the truncated binomial case, y has a truncated binomial distribution. So the probability model is

    Pr(Y=y) = ω for y=0 = (1 – ω) × py × (1 – p)n–y / (1 – (1 – p)n) × n! / (y! × (n–y!)) for y>0

where p and ω are given by the following models

log(p/(1-p)) = X β

log(ω/(1-ω)) = Z α

In the negative binomial case, y has a truncated negative binomial (λ, k). So the probability model is

    Pr(Y=y) = ω for y=0 = (1 – ω) × Γ(y + 1/k) / (y! × Γ(1/k)) × (1 + k × λ)-(y + 1/k) × (1 – (1 + k × λ)-1/k)-1, for y>0

where λ and ω are given by the same models as for the Poisson distribution, and k is the extra-variation parameter in the negative binomial distribution.

The truncated Poisson model is fitted using an iteratively re-weighted least squares algorithm (see Welsh et al. 1996). The truncated binomial and negative binomial models are fitted using `FITNONLINEAR`.. The default parameterization for the mixture models estimates ω′ (=1-ω), the probability of detecting at least one observation given that there is at least one observation, as in Welsh et al. (1996). You can use the `ZPARAMETERIZATION` option to change the parameterization to estimate ω, the probability of detecting a zero observation, instead.

Action with `RESTRICT`

If a parameter is restricted the statistics will be calculated using only those units included in the restriction.

References

Hall, D,B. (2000). Zero-inflated Poisson and Binomial regression with random effects: a case study. Biometrics, 56, 1030-1039.

Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.

Ridout, M., Demetrio, C.G.B. & Hinde, J. (1998). Models for count data with many zeros. International Biometrics Conference, Cape Town.

Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. (1996). Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling, 88, 297-308.

Procedures: `RNEGBINOMIAL`, `R0KEEP`.

Commands for: Regression analysis.

Example

```CAPTION    'R0INFLATED example - EM algorithm',\
'Apple shoot data',\
!t('Ridout et al. (1998)',\
'Models for count data with many zeros,',\
'IBC Cape Town 1998.'); STYLE=meta,minor,plain
FACTOR     [LABELS=!T('0.5','1','2','4'); VALUES=30(1,2),\
40(3,4),30(1,2,3),40(4)] Hormone
FACTOR     [LABELS=!T('8','16'); VALUES=140(1),130(2)] Period
1 1 1 2 2 3 3 3 4 4 4 4 4 4 5 5 5 6 6 7 7 8 8 8 9 10 10 11 13 17
2 2 2 4 6 6 6 7 7 7 7 7 7 7 8 8 8 9 9 9 9 9 10 10 10 11 11 11 11 13
2 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8
8 8 8 8 8 8 9 9 9 9 9 10 10 10 10 11 12 12 14 14
0 0 3 3 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7
8 8 8 8 8 8 8 8 9 9 9 10 10 10 10 11 11 11 11 14
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 3 3 4 5 5 6 8 9 9 9 10 11 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 3 4 4 5 6 6 8 10 10 10 12
0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 3 4 4 5 5 6 6 6 7 9 9 11 12
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 2 2 3 3 3 3 4 4 4 5 6 6 6 6 7 7 7 9 9 :
R0INFLATED [PRINT=mod,est,sum; CONSTANT=estimate; XTERMS=Hormone*Period]\
NShoots
R0INFLATED [PRINT=mod,est,sum; CONSTANT=estimate; XTERMS=Hormone*Period;\
ZCONSTANT=estimate; ZTERMS=Period] NShoots
R0INFLATED [PRINT=mod,est,sum; DISTRIBUTION=negativebinomial;\
XTERMS=Hormone*Period; ZTERMS=Period] NShoots
R0INFLATED [PRINT=mod,est,sum; DISTRIBUTION=negativebinomial;\
XTERMS=Period; ZTERMS=Period] NShoots

CAPTION    'R0INFLATED example - Conditional Model',\
!t('Welsh et al. (1996) Modelling the abundance of rare species:',\
'statistical models  for counts with extra zeros.',\
'Ecological Modelling.'); STYLE=meta,minor,plain
VARIATE    [NVALUES=151] no_lb,stags
7  0  0  3  2 10  7  3  0  0  0  0  0  2  0  1  0  4  3  2 10  7  0  3  7  0
0  0  0  0  5  9  0  0  0  0  1  0  5  4  0  0  4  0  4  0  2  0  0  1  1  0
3  0  0  0  0  0  2  0  0  1  0  2  5  3  0  0  0  0  0  0  0  0  5  0  0  0
0  0  0  1  5  4  0  0  0  0  3  0  3  3  1  0  0  0  0  0  2  0  0  1  0  3
0  0  4  0  0  3  4  0  8  5  3  0  0  0  5  5  0  2  0  0  0  0  0  2  0  2
0  0  0  0  0  4  0  0  0  0  5  0  0  0  0  0  1  0  0  0  0 :
12 15  6 14 16 16  9 20  7  4  6  5  4  6  4 10  6 11 11  4 16  8 10  9  7 10
15  5  7 10 11  8  8  3 14  5  8 14 11  2  1  1  7  2  7  7  1  6  8  6  6  5
6  0  0  2  0  1  3  2  2  6  3  4  3  4  5  2  3  4  4  2  2 10 16 10  4  3
2  2  2  2  3  1  6  8  2  4 12 13  3 14  2  4  0  2  3 14 29  2  4  6  3  8
4  7 20  4 11  5  1  2 27 24  9 18  3 20 25  4  4 30 24  8  4  6  5  3  5  2
3  5  7  4  5  4  4  1  4 23 25 31  0  8  4  4  1  3  1  1  4 :
CALCULATE  lstags = log(stags+1)
R0INFLATED [PRINT=mod,sum,est; METHOD=conditional; DIST=negative;\
ZTERMS=lstags; XTERMS=lstags] no_lb
R0INFLATED [PRINT=mod,sum,est; METHOD=conditional;\
ZTERMS=lstags; XTERMS=lstags] no_lb
```
Updated on June 18, 2019