< List of probability distributions

**Contents:**

**Pólya distribution**

Named after mathematician George Pólya, the **Pólya distribution** is a probability model that describes the number of red balls drawn from Pólya’s urn over a series of trials. Its counterpart, the *negative Pólya-Eggenberger distribution*, characterizes the number of black balls drawn.

The Pólya distribution has far-reaching applications in a variety of fields, from genetics to insurance to studying the spread of epidemics. Additionally, the multivariate version of the distribution, also known as the *Dirichlet-multinomial distribution*, adds another layer of complexity and is closely related to the beta binomial distribution.

## Pólya Distribution Process and PMF

The Pólya distribution, a special case of the negative binomial distribution, models a simple process: draw a random ball from an urn containing *r* red balls and *N *− *r* black balls. Record the color of the ball, then return the ball to the urn with* c* additional balls of the same color. Repeat the process for *n* draws. If *X *is the number of red balls removed in the first *n* trials, then the random variable *X* follows a Pólya distribution.

The probability mass function (PMF) is

With a large enough sample size, the Pólya distribution can be estimated with the binomial distribution. In general, this is true if N tends to infinity and p = 1 – q = r/N remains a constant [2].

## Rutherford distribution inspired by Pólya distribution

**Rutherford’s contagious distribution** (or simply the Rutherford distribution) was inspired by the Pólya distribution or the Pólya urn model, from which it arises naturally [3]. The distribution, built on prior work by Woodbury [4] concerns the probability of a success at any trial which depends linearly on the number of previous successes.

Woodbury considered a general Bernoulli scheme where the probability of a success depends on the number of previous successes, formulating the equation

*P*(*n* + 1, *x* + 1) = *p _{x}P*(

*n*,

*x*) + (1-

*p*

_{x}_{+1})

*P*(

*n*,

*x*+ 1).

Where

*p*= probability of success after_{x }*x*previous successes,*P*(*n*,*x*)*=*probability of*x*successes in*n*trials.

If no pairs of px’s are equal, then the following formula can be obtained

## Rutherford’s Contagious Distribution Formula

Rutherford’s contagious distribution detailed a special case of the formula. The idea is when a white ball is drawn from the urn, it is replaced with α other balls. This case of the Pólya distribution leads to a clustering of secondary cases around the first ball drawn. Rutherford used the linear function where *p _{x} *is determined by just two parameters:

*p _{x}*

_{ }**=**

*p*+*cx*(c > 0),implying that

*n*<*q/α*if*α*> 0, and*n*< –*p/α*if*α*< 0.

Rutherford’s special case formula avoids product notation:

Note, the distribution was proposed by R.S.G. Rutherford; there is no connection to Ernest Rutherford’s distribution that describes the scattering of alpha particles in physics.

## Arfwedson distribution

The **Arfwedson distribution **is a discrete probability distribution for an urn sampling problem for drawings without replacement.

“An urn contains

Arfwedson [5].Nnumbered balls. We makendrawings replacing the ball into the urn each time. What is the probability of gettingvdifferent balls?”

The distribution has been called other names, such as:

- The coupon-collecting distribution, because it describes the probability that a person with
*n*randomly selected coupons will have at least one of each of the*k*equally likely varieties [6]. - The classical occupancy distribution [7].
- Stirling2 distribution, because of the presence of the Stirling numbers of the second kind [8].
- Dixie cup [9].
- Stevens-Craig [10, 11].

## Arfwedson Distribution Formula

There are many different formulas for the Arfwedson distribution. They depend on the approach to the number of occupied or unoccupied bins; if unoccupied, it reverses the probability mass function (PMF).

Haight [12] lists the distribution as

Arfwedson gives the expected value as

Where g(n, ν) represents Stirling’s second class numbers, which have a probability generating function (PGF) of

(e^{x} – 1) ^{ν}

Afrwedson does give a more complicated alternative, the PGF

The function equals the coefficient of y^{n}/n! in

## References

[1] Polya urn image: Quartl, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[2] Teerapabolarn, K. An improved binomial distribution to approximate the polya distribution, International Journal of Pure and Applied Mathematics. Volume 93 No. 5 2014, 629-632

ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)

[3] Rutherford, R. S. G. (1954). On a Contagious Distribution. The Annals of Mathematical Statistics, 25(4), 703–713. http://www.jstor.org/stable/2236654

[4] Woodbury, M. (1949). On a probability distribution. The Annals of Mathematical Statistics, 20, pp. 311-313.

[5] G. Arfwedson, A probability distribution connected with Stirling’s second class numbers. Skand. Aktuarietidskr. 34 (1951), 121–132.

[6] David, F. N., and Barton, D. E. (1962). Combinatorial Chance, London: Griffin. [1.1.3, 10.2, 10.3, 10.4.1, 10.5, 10.6.1]

[7] O’Neill, B. (2019). The Classical Occupancy Distribution: Computation and Approximation. The American Statistician. n, DOI: 10.1080/00031305.2019.1699445

[8] Williamson, P. P., Mays, D. P., Abay Asmerom, G., and Yang, Y. (2009), “Revisiting the Classical Occupancy Problem,” The American Statistician, 63, 356–360. [1,2,3]

[9] Johnson, N. L., and Kotz, S. (1977). Urn Models and Their Application, New York: Wiley. [3.10, 4.2.1, 5.1, 10.4.1, 10.4.2, 11.2.19]

[10] Stevens, W. L. (1937). Significance of grouping, Annals of Eugenics, London, 8, 57–60. [10.1, 10.4.1]

[11] Craig, C. C. (1953). On the utilization of marked specimens in estimating populations of flying insects, Biometrika, 40, 170–176. [10.1, 10.4.1]

[12] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.