The hypergeometric distribution is a discrete probability distribution that calculates the likelihood an event happens k times in n trials when sampling from a small population without replacement. In other words, it describes the probability of getting a certain number of “successes” in a sample of a given size, drawn from a population of a known size, without putting any of the “successes” back into the population.
The hypergeometric distribution is almost the same as the binomial distribution, except that with the binomial distribution, you sample with replacement. With the hypergeometric, you sample without replacement.
In this article, we’re going to explore what the hypergeometric distribution is, look at a few examples, and show how it differs from the binomial distribution.
Hypergeometric formula and examples
The probability mass function (PMF) for the hypergeometric distribution is
- X = a random variable
- K = number of successes in the population
- k = number of observed successes
- N = population size
- n = number of draws.
Hypergeometric Distribution Examples
Q1: Picture the scene – a deck of cards sitting on the table, waiting for someone to pick them up and deal them out. But this is no ordinary deck. It’s filled with 20 cards – 6 red, 14 black – and you’re about to draw a hand of 5. Your heart races as you wonder what cards fate has in store for you. But you’re not just playing for the fun of it – you’ve got a question to answer. What’s the probability that you’ll draw exactly 4 red cards?
Solution: The probability of drawing exactly 4 red cards is:
P(4 red cards) = # samples with 4 red cards and 1 black card / # of possible 4 card samples.
The combinations formula tells us that (6C4*14C1)/20C5, where
- 6C4 means that out of six possible red cards in the deck (stated in the question), we want to get exactly four.
- 14C1 means that out of a possible 14 black cards (from the question), we want to draw exactly one.
Solution = (6C4*14C1)/20C5 = 15*14/15504 = 0.0135
You might think that you could use the binomial distribution for this question, but it doesn’t apply here, because the cards are not drawn with replacement. In other words, the trials are not independent. For example, for one red card, the probability is 6/20 on the first draw. If that card is red, the probability of choosing a second red card falls to 5/19.
Q2. Imagine you’re about to take part in a local election in a small voting district. Out of the 196 registered voters, 101 are women, and 95 are men. The suspense builds as 10 names are randomly drawn from the pool of voters. What are the chances that exactly seven of them will be women? Let’s crunch the numbers and see what fate has in store for your candidate.
Solution: Putting our numbers into the combinations formula we get:
- 101C7 is how many ways to choose 7 females from 101 candidates,
- 95C3 is how many ways to choose three male voters from 95 (if 7/10 voters are female, then 3/10 must be male.)
- 196C10 is the total voters (196) of which we are choosing 10
101C7*95C3/(196C10) = (17199613200*138415)/18257282924056176 = 0.130
Binomial vs. Hypergeometric distribution
The hypergeometric distribution is very similar to the binomial distribution. In fact, the binomial distribution is a good approximation of the hypergeometric distribution if you are sampling 5% or less of the population.
A key difference between the hypergeometric distribution and the binomial distribution is that the former is a discrete probability distribution, while the latter is a continuous probability distribution. This means that the hypergeometric distribution deals with situations where the number of possible outcomes is finite and countable, while the binomial distribution deals with situations where the number of possible outcomes is infinite and uncountable. For example, if you roll a die 10 times, the number of possible outcomes is finite and countable (1, 2, 3, 4, 5, or 6), so we would use the hypergeometric distribution to determine the probability of getting a specific number of 6’s. On the other hand, if you measure the length of a piece of wire to 10 decimal places, the number of possible outcomes is infinite and uncountable, so we would use the binomial distribution to model the probability of getting a certain range of values.
In addition to its applications in sampling without replacement, the hypergeometric distribution has other important applications in various fields such as genetics, ecology, and epidemiology. For example, it can be used to calculate the probability of getting a certain number of disease cases in a population of a given size, without assuming that the number of cases follows a normal distribution.
One important point to note is that the hypergeometric distribution assumes that the population size is much larger than the sample size. In other words, the proportion of “successes” in the population doesn’t change significantly after each draw. If the sample size is large relative to the population size, then the hypergeometric distribution is not an appropriate model and we should use the binomial distribution instead.
In conclusion, probability distributions are powerful tools for analyzing and interpreting data in various fields. While the binomial distribution is widely known and used, it’s important to understand that there are other probability distributions that are just as essential. The hypergeometric distribution is one such distribution, which is used to model situations where we are sampling without replacement. By understanding the key differences between the hypergeometric and binomial distributions, we can choose the appropriate distribution for our analysis and avoid common mistakes.
 Fuzzyrandom, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons