# Zero-Inflated Poisson distribution

The zero-inflated Poisson distribution (also called the zip distribution) is used to model count data that has an excess of zero counts; it is a generalization of the regular Poisson distribution to account for extra zeros. The Poisson is often not a good modeling choice for datasets due to the numerous zeros in data [1].

This type of model can be applied to many different fields such as business, economics, epidemiology, and ecology. For example, the zip distribution could model a company’s employee attendance. Predictors of the number of days of absence could include previous scores on yearly appraisals or a history of arriving late to work.

The probability mass function (PMF) of the ZIP distribution is [3]

where ≤ π ≤ 1 and λ ≥ 0.

The λ parameter is what forces the distribution to inflate the zeros; when λ=0, the ZIP distribution reduces to the Poisson distribution.

## Uses of the zero-inflated Poisson distribution

The zero-inflated Poisson distribution distribution is useful for modeling data sets which contain an abundance of zero valued data points. The model assumes that with probability p the only possible observation is 0, and with probability 1-p the observation follows a Poisson distribution or negative binomial distribution. It can also be used to account for overdispersion in count data, by assuming that there are two different types of individuals in the data: those who have a zero count with a probability p, and those who have a nonzero count with probability 1-p.

## Zero-inflated vs. zero-modified distributions

The main difference between a zero-inflated model and a zero-modified model lies in how they handle the excess zeroes in count data. In a zero-inflated model, it is assumed that the data contains two sources of zeroes: structural zeroes and sampling zeroes. Structural zeroes represent the true absence of events, while sampling zeroes result from chance. The model assumes two different generating processes, one for the structural zeroes and another for the count data. Thus, in zero-inflated models, the excess zeroes are modeled separately from the non-zero counts using a mixture of distributions.

On the other hand, a zero-modified model assumes that the counts follow a certain distribution (such as a Poisson or negative binomial distribution) but with an additional modification term that accounts for excess zeroes. The modification term could be an additional mass at zero (i.e., there is a probability of generating a zero that is greater than what the distribution would normally generate), or an additional distribution that generates zero counts.

In summary, zero-inflated models separate the excess zeroes into their own component, while zero-modified models modify the count distribution to account for excess zeroes. The choice of model depends on the nature of the data and the research question being addressed.

## Zero-inflated Poisson regression

In the last few years, significant attention has been given to regression models that utilize zero-inflated distributions. This interest can be largely attributed to Lambert’s influential paper [4], which showed that a ZIP regression is better than a Poisson regression at fitting data with many zeros, such as the number of manufacturing defects on wiring boards. However, zero-inflated models appear to have originated from the econometrics field [5].

The main difference between a “zero-inflated Poisson distribution” and “zero-inflated Poisson regression” lies in how they are used to model the data.

The zero-inflated Poisson distribution is a probability distribution that is used to model count data that has a significant proportion of zero counts. On the other hand, the zero-inflated Poisson regression is a modeling technique that extends the standard Poisson regression to account for zero inflation in the data. Instead of assuming that the count data follows a standard Poisson distribution, it assumes that the data is generated from a mixture of two processes, one that generates only zero counts and another that generates count data from a Poisson distribution. The zero-inflated Poisson regression allows for the estimation of the parameters of the model and testing hypotheses about the significance of the predictors.

In summary, while they share the same underlying zero-inflated Poisson distribution, the difference between them lies in their application. The zero-inflated Poisson distribution is used to describe the nature of the count data, while the zero-inflated Poisson regression is used to model and analyze the relationship between the count data and predictor variables.

#### References

[1] D. Böhning, “Zero-inflated Poisson models and C.A.MAN: a tutorial collection of
evidence”, Biometric Model 40:7 (1998), 833–843. Zbl 0914.62091

[2] Biggerj1, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[3] Becket, S. et. al. Zero-inflated Poisson (ZIP) distribution: parameter estimation and applications to
model data from natural calamities. Involve Journal of Mathematics. 2014. Vol 7. No. 6.

[4] Lambert, D. (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.

[5] Riddout, M. Zero-inflated models. Retrieved May 4, 2023 from: https://www.kent.ac.uk/smsas/personal/msr/webfiles/zip/zip.html

Scroll to Top