Pólya distribution / Pólya’s urn model

< List of probability distributions

Contents:

Pólya distribution

Named after mathematician George Pólya, the Pólya distribution is a probability model that describes the number of red balls drawn from Pólya’s urn over a series of trials. Its counterpart, the negative Pólya-Eggenberger distribution, characterizes the number of black balls drawn.

The Pólya distribution has far-reaching applications in a variety of fields, from genetics to insurance to studying the spread of epidemics. Additionally, the multivariate version of the distribution, also known as the Dirichlet-multinomial distribution, adds another layer of complexity and is closely related to the beta binomial distribution.

A Dirichlet process is an extension of Polya’s urn — a thought experiment where you sample a number of balls at random and note their color [1].

Pólya Distribution Process and PMF

The Pólya distribution, a special case of the negative binomial distribution, models a simple process: draw a random ball from an urn containing r red balls and − r black balls. Record the color of the ball, then return the ball to the urn with c additional balls of the same color. Repeat the process for n draws. If is the number of red balls removed in the first n trials, then the random variable X follows a Pólya distribution.

The probability mass function (PMF) is

N, n, r, and c are natural numbers.

With a large enough sample size, the Pólya distribution can be estimated with the binomial distribution. In general, this is true if N tends to infinity and p = 1 – q = r/N remains a constant [2].

Interesting video from Numberphile on how to check election results, featuring Pólya’s Urn.

Rutherford distribution inspired by Pólya distribution

Rutherford’s contagious distribution (or simply the Rutherford distribution) was inspired by the Pólya distribution or the Pólya urn model, from which it arises naturally [3]. The distribution, built on prior work by Woodbury [4] concerns the probability of a success at any trial which depends linearly on the number of previous successes.

Woodbury considered a general Bernoulli scheme where the probability of a success depends on the number of previous successes, formulating the equation

P(n + 1, x + 1) = pxP(n, x) + (1- p x+1) P (n, x + 1).

Where

  • px = probability of success after x previous successes,
  • P(n, x) = probability of x successes in n trials.

If no pairs of px’s are equal, then the following formula can be obtained

Rutherford’s Contagious Distribution Formula

Rutherford’s contagious distribution detailed a special case of the formula. The idea is when a white ball is drawn from the urn, it is replaced with α other balls. This case of the Pólya distribution leads to a clustering of secondary cases around the first ball drawn. Rutherford used the linear function where px is determined by just two parameters:

px = p + cx (c > 0),

implying that

  • n < q/α if α > 0, and
  • n < –p/α if α < 0.

Rutherford’s special case formula avoids product notation:

rutherford's contagious distribution

Note, the distribution was proposed by R.S.G. Rutherford; there is no connection to Ernest Rutherford’s distribution that describes the scattering of alpha particles in physics.

Arfwedson distribution

The Arfwedson distribution is a discrete probability distribution for an urn sampling problem for drawings without replacement.

“An urn contains N numbered balls. We make n drawings replacing the ball into the urn each time. What is the probability of getting v different balls?”

Arfwedson [5].

The distribution has been called other names, such as:

  • The coupon-collecting distribution, because it describes the probability that a person with n randomly selected coupons will have at least one of each of the k equally likely varieties [6].
  • The classical occupancy distribution [7].
  • Stirling2 distribution, because of the presence of the Stirling numbers of the second kind [8].
  • Dixie cup [9].
  • Stevens-Craig [10, 11].

Arfwedson Distribution Formula

There are many different formulas for the Arfwedson distribution. They depend on the approach to the number of occupied or unoccupied bins; if unoccupied, it reverses the probability mass function (PMF).

Haight [12] lists the distribution as

Arfwedson gives the expected value as

arfwedsen distribution

Where g(n, ν) represents Stirling’s second class numbers, which have a probability generating function (PGF) of

(ex – 1) ν

Afrwedson does give a more complicated alternative, the PGF

The function equals the coefficient of yn/n! in

References

[1] Polya urn image: Quartl, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[2] Teerapabolarn, K. An improved binomial distribution to approximate the polya distribution, International Journal of Pure and Applied Mathematics. Volume 93 No. 5 2014, 629-632
ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)

[3] Rutherford, R. S. G. (1954). On a Contagious Distribution. The Annals of Mathematical Statistics, 25(4), 703–713. http://www.jstor.org/stable/2236654

[4] Woodbury, M. (1949). On a probability distribution. The Annals of Mathematical Statistics, 20, pp. 311-313.

[5] G. Arfwedson, A probability distribution connected with Stirling’s second class numbers. Skand. Aktuarietidskr. 34 (1951), 121–132.

[6] David, F. N., and Barton, D. E. (1962). Combinatorial Chance, London: Griffin. [1.1.3, 10.2, 10.3, 10.4.1, 10.5, 10.6.1]

[7] O’Neill, B. (2019). The Classical Occupancy Distribution: Computation and Approximation. The American Statistician. n, DOI: 10.1080/00031305.2019.1699445

[8] Williamson, P. P., Mays, D. P., Abay Asmerom, G., and Yang, Y. (2009), “Revisiting the Classical Occupancy Problem,” The American Statistician, 63, 356–360. [1,2,3]

[9] Johnson, N. L., and Kotz, S. (1977). Urn Models and Their Application, New York: Wiley. [3.10, 4.2.1, 5.1, 10.4.1, 10.4.2, 11.2.19]

[10] Stevens, W. L. (1937). Significance of grouping, Annals of Eugenics, London, 8, 57–60. [10.1, 10.4.1]

[11] Craig, C. C. (1953). On the utilization of marked specimens in estimating populations of flying insects, Biometrika, 40, 170–176. [10.1, 10.4.1]

[12] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.

Scroll to Top