< List of probability distributions < Dirichlet distribution
The Dirichlet distribution can be used to model random probability mass functions (PMFs) for finite sets.
The most common use of a Dirichlet distribution is to model the probabilities of different outcomes in a categorical data set. For example, if you have data with three categories – “yes”, “no” and “maybe” – then you could use a Dirichlet distribution to model the likelihood of each outcome. It also widely used in data science and machine learning, and can also be useful for many other applications such as as a prior distribution in Bayesian statistics.
the Dirichlet distribution is named after the 19th century Belgian mathematician Johann Dirichlet.
What Is a Dirichlet Distribution?
A “random PMF” arises because most real life phenomena include a component of randomness. For example, a “fair die” is almost certainly a myth; Manufacturing processes are pretty good, but they aren’t perfect. If you roll 1000 dice, the theoretical odds of any particular number showing up (i.e. a 1, 2, 3, 4, 5, or 6) are 1/6. However, you won’t get that exact distribution in a real experiment due to manufacturing defects. No die is perfectly weighted—there will always be a tiny bit of sway to one side of a die or another. If you have ten dice, each individual die will have its own PMF.
The Dirichlet process can also model a random PMF with unlimited draws (such as an unlimited amount of dice in a bag). The process is similar to Polya’s urn, but you have an unlimited amount of “balls” in a Dirichlet process, compared to the fixed number in Polya’s urn.

- Start out with an empty urn.
- Randomly pick a colored ball and place it in the urn.
- Then choose one option:
- Randomly pick a colored ball and place it in the urn.
- Randomly remove a colored ball from the urn, then put it back with another ball of the same color.
As the number of balls in the urn increase, the probability of picking a new color decreases. A Dirichlet process models the proportion of balls in the urn after an infinite amount of draws.
How the Dirichlet distribution works
The Dirichlet distribution works by assigning weights to each category based on their prior probability. For example, if one category has twice the prior probability of another category, then it will get twice the weight in the final result. In addition, the total sum of all weights must equal one (i.e., they must add up to 100%). This allows us to accurately model any given PMF with relative ease [2].
More precisely, the distribution creates n positive numbers (a set of random vectors X1…Xn) that add up to 1; Therefore, it is closely related to the multinomial distribution, which also requires n numbers that sum to 1.
The parameters of a Dirichlet distribution are determined by its alpha values, which are assigned to each category in order to determine their respective weights in the resulting PMF. The alpha values represent the relative strength or importance of each category; higher alpha values indicate stronger categories while lower alpha values indicate weaker categories. Note: although alpha (significance) levels used in hypothesis testing and alpha levels in a Dirichlet distribution both try to balance the risk of making an error, they are not directly related.
Dirichlet distribution properties
Probability density function (PDF):

The mean of θj is:
E(θj) = aj / A.
The variance of θj is:
var(θj) = aj / A(A + 1) – aj / A(A + 1).
Similarity to other distributions
In addition to its connection to Polya’s urn, the Dirichlet distribution is related to several other distributions such as
- The Dirichlet is a multivariate generalization of the beta distribution; the Dirichlet extends the beta distribution to model probabilities for two or more disjoint events. When m = 2, the Dirichlet PDF equals the beta distribution.
- The Dirichlet equals the uniform distribution when all parameters (α1 … αk) are equal.
- The Dirichlet distribution is a conjugate prior to the categorical distribution and multinomial distributions [3].
- A compound variant is the Dirichlet-multinomial.
- The Balding-Nichols is a Dirichlet distribution specific to population genetics.
References
[1] Polya urn image: Quartl, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons
[2] Xing, E. Lecture 23: Bayesian Nonparametrics: Dirichlet Processes. 10-708: Probabilistic Graphical Models, Spring 2020
[3] Gu, L. Dirichlet Distribution, Dirichlet Process and
Dirichlet Process Mixture. Retrieved March 30, 2023 from: https://www.cs.cmu.edu/~epxing/Class/10701-08s/recitation/dirichlet.pdf
Pingback: Balding-Nichols distribution - P-Distribution