# Poisson Binomial distribution

The Poisson binomial distribution is a probability distribution that has numerous applications in many fields of study. It is a generalization of the well-known Bernoulli distribution and is well suited to describe the probability of success in a sequence of independent and non-identically distributed trials. This means that the probability of success can vary between each Bernoulli trial. In this blog post, we’re going to delve into the Poisson binomial distribution, its properties, and its applications.

## Properties of the Poisson Binomial Distribution

The Poisson binomial distribution (PBD), first studied by S. Poisson in 1837  are a natural n-parameter generalization of the Binomial Distribution. It is perhaps the simplest n-parameter probability distribution with some nontrivial structure . Properties of the PBD include:

• This distribution is asymmetric, which means that the mean and variance are not equal to each other.
• The distribution is discrete, which means that it is only defined on integer values.
• It is more flexible than the Bernoulli and binomial distributions since it can handle Success/Failure trials with different probabilities of success.

Probability Mass Function:

The probability mass function (PMF) of the Poisson binomial distribution can be summed up as follows: for a given n trials, each of which has a probability of success pi, the Poisson binomial distribution describes the probability of having k successes. The formula for this is :

where Xis are independent Bernoulli random variables, such that 𝔼[Xi] = pi.

## Applications of the Poisson Binomial Distribution

The Poisson binomial distribution arises in many settings such as its tail bounds form a special case of Chernoff/Hoeffding bounds [4, 5, 6].

The PBD has numerous applications in fields such as case-control studies, epidemiology, finance, genetics, survival analysis and survey sampling [2, 7]. For example, geneticists use this distribution to model the evolution of traits in a population, while epidemiologists use it to model the spread of diseases. Additionally, in finance, the Poisson binomial distribution is used to model the risk of multiple asset investments. Furthermore, it’s used in machine learning for the training and validation of binary classification problems.

## Solving the Poisson Binomial Distribution

Given the simplicity and ubiquity of the Poisson binomial distributions, it may be surprising to learn that problem of density estimation for PBDs is not well understood. As such solving the Poisson binomial distribution is not a trivial task but certain approximations can be made. For example:

• When the number of trials is small, the Poisson binomial distribution can be approximated by the Bernoulli distribution.
• When the probability of success in each trial is very close to 0 or 1, the Poisson binomial distribution can be approximated by the Poisson distribution.
• Monte Carlo methods can be used, which is a simulation-based approach, as well as Fourier methods which compute the Poisson binomial cumulative distribution function explicitly.
• Algorithms have been developed based on the premise that every Poisson binomial distribution is either close to a PBD with sparse support, or is close to a translated “heavy” Binomial distribution .

In conclusion, the Poisson binomial distribution is a powerful and flexible distribution that can be used to model numerous phenomena in different fields. Its flexibility in handling independent non-identically distributed trials allows for a more accurate representation of real-world data. Furthermore, the applications of this distribution are expansive, from genetics to finance and beyond. Understanding this distribution can be challenging; however, it can be simplified via approximations, simulation-based approaches, and explicit calculations.

## References

 S.D. Poisson. Recherches sur la Probabilite des jugements en matie criminelle et en matiere
civile
. Bachelier, Paris, 1837.

 Daskalakis, C. et al. Learning Poisson Binomial Distributions. Retrieved May 2, 2023 from: http://www.cs.columbia.edu/~rocco/Public/stoc12pbd.pdf

 Ribinfield, R. (2019) 6.889 Sublinear Time Algorithms, Lecture 17. Retrieved May 2, 2023 from: https://people.csail.mit.edu/ronitt/COURSE/S19/Handouts/scribe17.pdf

 H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist., 23:493–507, 1952.

 W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the
American Statistical Association, 58:13–30, 1963.

 D. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge, 2009.

 S.X. Chen and J.S. Liu. Statistical applications of the Poisson-Binomial and Conditional Bernoulli Distributions. Statistica Sinica, 7:875–892, 1997.

Scroll to Top