# Wallenius Distribution

The Wallenius’ Distribution, also known as the Wallenius’ noncentral hypergeometric distribution, is a biased sampling distribution used to model the number of successes in a series of dependent draws from a population. The distribution is named after Kenneth Ted Wallenius, who first described it in 1963 [1].

The Wallenius distribution serves as an extension of the hypergeometric distribution, which is used to model the count of successful outcomes in a sequence of independent draws from a given population. The primary distinction between these two distributions is that the Wallenius distribution allows dependent draws, whereas the hypergeometric distribution does not allow for such dependency.

While the Wallenius distribution is useful in specific scenarios involving biased sampling without replacement, it may not be applicable to many other situations where other distributions, such as the binomial or Poisson distributions, might be more suitable.

## Urn models and the Wallenius distribution

The Wallenius distribution is often depicted as an urn model that involves drawing samples without replacement and incorporating bias. In a standard urn model, a collection of balls is placed in an urn, and each ball is removed one at a time without replacement. If all the balls are identical, the process is unbiased and follows a hypergeometric distribution. However, in a biased urn model, where balls have varying weights or sizes, the probability of selecting a specific ball is affected, resulting in a Wallenius Distribution.

An urn model adhering to a Wallenius distribution possesses these traits:

1. Items are selected individually from a diverse fixed population (e.g., 100 uniquely colored balls).
2. A fixed number of independent trials occur (e.g., a fixed number of balls chosen).
3. Items are chosen randomly and without replacement.
4. The probability of selecting a particular item corresponds to its proportion of the total weight or volume of all items. In other words, a heavier or larger ball has a higher probability of being chosen.
5. An item’s bias (e.g., its weight or volume) relies solely on its color. This means that all blue balls have the same weight, all red balls have the same weight, all green balls have the same weight, and so on.

## Other uses for the Wallenius distribution

Not all examples of Wallenius distribution involve urns, although all play on the theme of choosing items from a group. For example:

• Biased sampling: Imagine you are selecting 50 crickets one by one using a large pair of tweezers from a small cage containing green, black, and white crickets. The green crickets are the largest, followed by the black and then the white crickets. Due to their larger size, the green crickets have a higher probability of being chosen. The distribution of the types of crickets captured will correspond to the Wallenius’ noncentral hypergeometric distribution.
• Estimating the count of survivors in competitive settings: The Wallenius distribution can also be applied to estimate the number of individuals from a population that endure a competitive environment. For instance, this distribution could be used to determine the surviving species within a specific habitat or the individuals who survive an outbreak of a disease [3].
• Assessing the quantity of errors in a sequence of dependent trials: The distribution can be used to evaluate the number of mistakes occurring in a series of dependent trials. For example, it could be used to assess the errors made by students during a test or the errors committed by workers on a production line.
• Calculating the number of successful outcomes in a series of dependent experiments: The Wallenius distribution can be used to compute the number of successful results in a sequence of dependent experiments. For instance, it could be employed to determine the effectiveness of a drug in curing a disease or the success rate of a new product in the market.

## The downside

The Wallenius distribution is often considered to be inefficient and prone to numerical instability. The distribution involves more complex mathematics compared to the simpler hypergeometric distribution. This can make it more challenging to understand and apply in practical situations.

Calculating probabilities and moments for the Wallenius distribution can be computationally intensive, especially for large sample sizes or populations. This can limit its usefulness, although the advent of modern computational tools has resulted in some renewed interest in the distribution’s properties [e.g., 4].

## References

[1] Wallenius, K. T. (1963). Biased Sampling: The Non-central Hypergeometric Probability Distribution. Ph.D. Thesis (Thesis). Stanford University, Department of Statistics.

[2] Polya urn image: Quartl, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[3] Manly, B. F. J. (1985). The Statistics of Natural Selection on Animal Populations. London.

[4] Martens, D. & Foster, P. (2013). Wallenius Naive Bayes. Retrieved September 19, 2023 from: https://archive.nyu.edu/handle/2451/33545?mode=full

Scroll to Top