< Back to Probability Distribution List < *Negative Hypergeometric Distribution*

## What is the negative hypergeometric distribution?

The **negative hypergeometric distribution**, also called the *Romanovsky distribution* [1], is used to calculate probabilities when sampling from a finite population without replacement. This distribution is applicable when each sample can be categorized into two exclusive groups, such as Black/White or Economy class/Business class. As random selections are drawn from the population without replacement, the probability of success changes with each draw due to the decreasing population size.

Assume there are a finite number of objects *N*, with *r* successes. This means that there are *N* – *r* failures. With the negative hypergeometric distribution, we want to sample without replacement until we get *k* successes. If *X *is the number of trials needed to obtain k successes, the probability density function (PDF) of *X* is given by

## Example

**Question**: What is the probability that the 3rd queen is the 12th card chosen from a standard, shuffled deck of cards?

**Solution**: Substitute the following into the formula

- X = # cards drawn
- x = 12 (i.e., the 12th card)
- N = 52 (cards in a standard deck)
- r = 4 (there are 4 possible queens in a deck)
- k = 3.

Substituting that information into the formula, we get

## Negative hypergeometric distribution vs. hypergeometric distribution

While the negative hypergeometric distribution describes the probability of finding a specific number of successes in a sample, the regular hypergeometric distribution considers the number of successes in a fixed sample size. More specifically, the negative hypergeometric distribution determines the probability of getting a certain number of successes in a sample with a predetermined number of failures. In other words:

- Hypergeometric distribution: defines samples as fixed in size,
- Negative hypergeometric distribution defines samples as a fixed number of failures.

For example, suppose we have a population of 100 people, of which 20 are successes and 80 are failures. If we want to calculate the probability of getting 5 successes in a sample of 10, then we would use the hypergeometric distribution. However, if we want to calculate the probability of getting 5 successes in a sample with 8 failures, then we would use the negative hypergeometric distribution.

## Negative hypergeometric distribution vs. negative binomial distribution

Both the negative hypergeometric distribution and the negative binomial distribution sample until you have a certain number of successes. The difference is that the negative binomial distribution deals with infinite samples, while the negative binomial distribution deals with finite samples.

## Romanovsky distribution

The name “Romanovsky distribution” is also used to describe a restricted occupancy distribution in “ball and urn” investigations [3]. It is named after V.I. Romanovsky [5] who proposed the distribution to construct a hypothesis test concerning the homogeneity (similarity) of two samples.

Suppose we have two ordered samples *S* from the same collection, with volumes *N* and *M* and unknown probability density f(x), with:

x_{1} ≤ x_{2} ≤ … ≤ x_{n},

y_{1} ≤ y_{2} ≤ … ≤ y_{n} (N ≥ 1, M ≥ 1).

Also suppose that sample *x* has *n *samples, not more than *x*_{n+1} and N – n -1 members, at least x_{n+1}. Then the probability that the second sample will have *μ *members is not more than *x*_{n+1} and *M* – μ members over x_{n+1}:

## Historical Notes on the Romanovsky Distribution

Haight [4] lists Romanovsky’s distribution in the index yet points to as a sparse entry titled “Romanovsky’s generalization”:

The references point to Biometrika, where “Romanovsky’s generalised curve” is mentioned in [6] as a generalization of Pearson distributions, also called *Pearson frequency curves*. Specifically, the curve is written as the equation

Where

- A
_{0}, A_{1}, A_{2}, … are constants - μ
_{ 0}, μ_{1}, μ_{2}, …are certain definite one-valued functions of*x*.

The first term in the series represents one of the Pearson frequency curves depending on the choice of function.

Wishart reports that Romanovsky’s curve “do not appear to better the existing types, owing to the expansion in terms of functions which are not suite to the purpose.” In addition, he notes that apart from a tiny region the series expansions — which expresses a function as an infinite sum, or series, of simpler functions — are not convergent (they do not settle on a particular result), “hence cannot give us really satisfactory fits.”

## References

[1] Yusupova A.K., Gafforov R.A. Refining One Theorem For The Romanovsky Distribution. The American Journal of Interdisciplinary Innovations and Research. Vol. 3 No. 06 (2021)

[2] Top image: T113355, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[3] Charalambides, Ch. A. On Restricted and Pseudo-Contagious Occupancy Distributions. Journal of Applied Probability. Vol. 20, No. 4 (Dec., 1983), pp. 872-876 (5 pages) Published by: Applied Probability Trust.

[4] Romanovsky V.I. Ordered samples from the same continuous population. Proceedings of the Institute of Mathematics and Mechanics. Tashkent, 1949, pp. 5-19

[5] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.

[6] Wishart, J. On Romanovsky’s Generalised Frequency Curves. Biometrika Vol. 18, No. 1/2 (Jul., 1926), pp. 221-228 (8 pages) Published By: Oxford University Press