Probability Distributions List A to Z


What is a probability distribution?

probability distribution
Probability distributions can take on many different shapes, depending on factors such as skew and kurtosis [1].

Probability distributions give the probability of an event occurring, from simple events like tossing a coin or choosing a card, to complex ones such as whether a treatment drug works or not. Basic distribution can be shown on a probability table, while more complex ones, such as normal distributions can be graphed with a “curve”. The sum of all the probabilities within a probability distribution always results in 100% or equivalently, 1 as a decimal.

A probability distribution describes all the possible values and likelihoods that a random variable can take within a given range. The range of possible values will be bounded between the minimum and maximum possible values, but precisely where the value falls on the distribution depends on a number of factors. These factors include the mean (average), standard deviation, skewness, and kurtosis of the distribution.

Properties of probability distributions

One of the most common ways to describe a continuous probability distribution, like the normal or t-distribution, is with a probability density function (PDF). These are formulas that tell you specific probabilities for variables at any point in the distribution. The idea is that you plug in a value for x to get the probability of x occuring.

The half-logistic distribution PDF.

For discrete distributions such as the binomial distribution, a probability mass function (PMF) is used instead. The idea is the same, except that you’re dealing with discrete data points rather than continuous ones.

Summary statistics such as the mean or standard deviation can give you insight into data trends. The mean of a probability distribution is its average value. It’s calculated by adding up all the possible values of the random variable and then dividing them by their number of possibilities. The standard deviation shows how spread out these possibilities are from each other; if it’s high, there’s greater variation among the possible values than if it’s low. The larger the standard deviation, the more likely extreme values are to occur; for example, if you have a high standard deviation for a fair coin toss (i.e., heads or tails), you may get multiple heads in a row more often than normal.

Skewness is another important factor in understanding probability distributions; it measures how symmetrical or asymmetrical they are around their mean values. A symmetrical distribution has an equal amount of data on either side of its mean value; an asymmetrical distribution has more data points on one side than another. Kurtosis measures how peaked or flat a probability distribution is relative to its mean value; higher kurtosis means that there are more extreme values near its mean than lower kurtosis does.

Understanding probability distributions is essential for any student studying statistics or related fields like economics or finance. Understanding what factors affect each type of probability distribution – such as skewness, kurtosis, mean and standard deviation – can help you make better decisions when analyzing data sets from real-world scenarios like elections or stock market performance analysis. By having an understanding of these concepts, you can gain valuable insights into various phenomena in our world today!

Probability distributions A to Z

This list of several hundred probability distributions was compiled during my research into archaic and unusual probability distributions (or unusual names for well-known distributions). Many of these are not mentioned in traditional textbooks or found on the major websites.

In addition, you’ll find common distributions which are referenced by the more unusual distributions, as well as definitions and help pages for common statistics terms.

Please leave me a comment if there’s a distribution you want to see on the list.

Skip to A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z,




























[1] IkamusumeFan, CC BY-SA 4.0, via Wikimedia Commons

Azzalini distribution

Azzalini’s skew-normal class of distributions are a family of distributions that includes the normal distribution as a special case. They have an extra parameter (λ) to regulate skewness [1].

These one parameter distributions are defined by the probability density function (PDF):


Location parameters and scale parameters are an optional addition to the PDF.

While there are many references to the skew-normal in the literature, it was Azzalini [2] who gave a systematic treatment of the distribution. Therefore, it is occasionally referred to as the Azzalini distribution (e.g., Johnson et al. [3].) The skew normal distribution is a bit of a misnomer as the skew normal isn’t actually skewed (much): it is is a special case of the Azzalini distribution with a skew of zero. Setting the skew parameter to small values, close to zero, result in a skew normal distribution with a tiny bit of skew.

The distribution originates from Azzalini’s note [2] that if X and Y are two independent random variables with individual PDFs that are symmetric about zero, then for any A

Consequently, 2pY(y)FX(λy) is a PDF. If we take X and Y to be unit normal variables, we get Azzalini’s distribution.

Usefulness of the Azzalini Distribution

The Azzalini distribution has a number of useful properties which approximate the normal distribution, justifying the “skew-normal” name. For example, if X has a Azzalini PDF, then X2 follows a chi-squared distribution with one degree of freedom for all values of X. In applied statistics, the distribution can be used to analyze skewed data from a unimodal empirical distribution, which occurs frequently in practical problems.


[1] Azzalini, A. & Valles, D. (1996). The multivariate skew-normal distribution. Biometrika, 83, 4, pp. 715-726.

[2] Azzalini, A. (1985). A class of distributions which includes the normal ones, Scandinavian Journal of Statistics, 12, 171-178.

[3] Johnson, Kotz, and Balakrishnan, (1994), Continuous Univariate Distributions, Volumes I and II, 2nd. Ed., John Wiley and Sons.

Probability distributions list (unusual and/or archaic)

Binomials added distribution

Haight [1] lists the “binomials added distribution” as a single entry without a formula:

binomials added distribution

MR17 refers to Mathematical reviews, which Haight notes as “in no case offer a review of a paper appearing in coded journals” and indicating “…publications in obscure (from the point of view of [1951]) sources.” The journal [w] referenced is Mitteilungsblatt fur Mathematische Statistik, Volume 5. I was unable to locate copies of either of these obscure references. If anyone has access, please leave a comment. Thank you!

The problem of adding binomials was addressed briefly in [2], referencing a formula for a toric ideal:

Carlini and Rapallo note that “the two added binomials do not have a simple statistical counterpart”.

This may indicate that adding certain binomials may be difficult to compute or intractable in some circumstances. Which may be why the “added binomials distribution” has vanished into obscurity.


[1] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.

[2] Carlini, E. & Rapallo, F. (2013). Toric ideals with linear components: an algebraic interpretation of clustering the cells of a contingency table. Online:

Champernowne distribution

The Champernowne distribution is a heavy tailed three parameter distribution that converges to a Pareto distribution as x → ∞ [1].

Buch-Larsen et al. [2] proposed the Champernowne distribution as a transformation function with cumulative distribution function (CDF)

champernowne distribution CDF

with parameters (α, M, c).

Buch-Larsen et al’s probability density function (PDF) is

The distribution was first introduced by Champernowne in 1952 [3] to describe the logarithm of income distribution. since then, several authors have recommended the Champernowne distribution — with a nonparametric correction — to describe operational risk losses, including Gustafsson [4] Buch-Kromann et al. [5] and Guillen et al. [6].


[1] Buch-Kromann, T. (2009). Comparison of Tail Performance of the Champernowne transformed Kernel Density Estimator, the Generalized Pareto Distribution and the g-and-h distribution. Retrieved April 26, 2023 from:

[2] T. Buch-Larsen, J. P. Nielsen, M. Guillen, and C. Bolance. Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics, 39(6):503–518, 2005. ISSN 0233-1888.

[3] Champernowne, D. G. (1952). “The graduation of income distributions”.  Econometrica20 (4): 591–614. doi:10.2307/1907644JSTOR 1907644.

[4] J. Gustafsson. Modelling operational risk with kernel density estimation using the Champernowne transformation. The ICFAI Journal of Risk & Insurance, 3(4):39–75, 2006.

[5] T. Buch-Kromann, M. Englund, J. Gustafsson, J. P. Nielsen, and F. Thuring. Non-parametric estimation of operational risk losses adjusted for under-reporting. Scandinavian Actuarial Journal, 4:293–304, 2007.

[6] M. Guillen, J. Gustafsson, J. P. Nielsen, and P. Pritchard. Using external data in operational risk. The Geneva papers, 32:178–189, 2007.

Coupon collector distribution

The coupon collector’s distribution is a discrete probability distribution that models the number of trials needed to collect all n different types of coupons in an experiment where there are n different types of coupons, and each coupon has an equal chance of being selected. The coupon collector problem is analogous to collecting a full set of stickers or cards, where each sticker or card has an equal chance of being obtained at random. It can be described as [1]

  • There are n different types of coupons.
  • Each purchase comes with one random coupon.
  • Every coupon is equally likely to appear.
  • How many purchases needed to collect all coupons?

It’s important to note that the distribution is random and it is not possible to determine an exact number of purchases required to obtain all coupons/cards. However, we can calculate an average; the expected value of this distribution is given by n multiplied by the nth harmonic number, where the nth harmonic number is the sum of the inverses of the first n natural numbers.

The variance of the distribution is given by n2 * the nth harmonic number minus the square of the sum of the first n inverse numbers.

Applications of the coupon collector’s distribution are found in fields such as computer science, in the analysis of hashing algorithms, and in marketing research, to estimate the size of a target market or to estimate the time needed to reach a collection goal.


[1] Coupon collector’s problem. Retrieved May 7, 2023 from:

Dickman distribution

The Dickman distribution first arose in number theory, in the context of friable integers — integers that are free of large prime factors [1]. The distribution arises as follows [2]:

For a positive integer k, let p1(k) be the largest prime divisor of k. If ξn is a uniform random variable on the set {1, 2, . . . , n}, then for any a > 0,

dickman distribution

as n tends to infinity, where D is a continuous, nonnegative function satisfying the equation

aD′(a) + D(a − 1) = 0, a > 1,

with initial condition D(a) = 1 for a ∈ [0, 1].


[1] K. Dickman, On the frequency of numbers containing primes of a certain relative magnitude, Ark. Mat. Astr. Fys. 22 (1930)

[2] Grabchak, M. et al. (2022). Around the infinite divisibility of the Dickman distribution and related topics. Retrieved April 26, 2023 from:

Elfving distribution

The Elfving distribution is defined as [1]

elfving distribution

The distribution is named after Finnish statistician and mathematician Gustav Elfving (1908-1984), who described the distribution in 1947 [2].

In his Biometrika paper, Elfving established a distributional result concerning order statistics. Specifically, he investigates the distribution of the sample range when the samples are drawn from a standard normal distribution. Previous work on exact calculations for finite samples were found to be intractable [3]. Elfving tackled the problem from a different direction, determining the asymptotic distribution of the sample range for large samples.

Elfving Distribution Alternatives

Elfving’s distribution has a distinct disadvantage over other methods, which may be why it’s little discussed outside of a few historical references. While other methods can be expressed directly in terms of the range, Elfving’s formula involves a non-linear transformation of the range, making it a theoretical challenge [4]. For example, Gumbel’s method [5] leads to the same results as Elfving’s distribution; however, Gumbel’s method requires no knowledge of sample size, the analytical form of the initial distribution, or numerical values of the distribution’s parameters.


[1] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.

[2] Elfing, G. (1947). The asymptotical distribution of range in samples from a normal population. Biometrika 34 111–119.

[3] Nordstrom, K. (1999). The Life and Work of Gustav Elfving. Statistical Science. Vol. 14, No. 2, 174-196.

[4] Cox, D. R. (1948). A Note on the Asymptotic Distribution of Range. Biometrika35(3/4), 310–315.

[5] Gumbel, E. (1949). Probability tables for the range. Biometrika, 36: 142-148.

Eulerian distribution

The term “Eulerian distribution” may refer to:

  • A distribution used in permutation statistics to count descents, or
  • A Type III (p/q, p) distribution.

Eulerian Distribution in Permutation Statistics

In permutation statistics, the Eulerian distribution is the classical name for the distribution of the descent statistics. A descent in a permutation α1, α2, …, αn is an index I for which αi > αi+1 [1]. The Eulerian number counts the number of permutations in a descent set with exactly k descents [2].

Type III (p/q, p) Eulerian Distribution

The term Eulerian distribution may also refer to a Type III (p/q, p) distribution [3]. Kendall’s 1948 Biometrika article, titled “On some modes of population growth leading to R.A. Fisher’s logarithmic series distribution,” concerns relative numbers of animals from different species obtained when random sampling from a heterogeneous population. Fisher assumed that for a certain species, the number of individuals caught in a specific time would be distributed as a Poisson variable with expectation ωt, where ω is the “intrinsic abundance”. The following formula, describing the distribution of ω, is referenced as a Eulerian (or χ2) form:


  • Ω is the mean value of ω.
  • k is a constant parameter.


[1] Mansour, T. & Munagi, A. (2010). Enumeration of partitions by rises, levels, and descent. In Permutation Patterns. Cambridge University Press.

[2] Hibi  & Tsuchiya. (2019). Algebraic And Geometric Combinatorics On Lattice Polytopes – Proceedings Of The Summer Workshop On Lattice Polytopes. World Scientific.

[3] Kendall, D. G. (1948). On Some Modes of Population Growth Leading to R. A. Fisher’s Logarithmic Series Distribution. Biometrika35(1/2), 6–15.

Gaussian q-distribution

The Gaussian-q family of distributions is a generalization or q-analog (i.e., a q-distribution) of the normal distribution. The Gaussian q-distribution can represent heavy tailed distributions such as Student’s t distribution or a distribution with bounded support such as the semicircle of Wigner. The distribution has been applied to a diverse set of data from fields such as statistical mechanics, finance, geology, machine
learning and statistical mechanics. [1]

Introduced by Diaz and Teruel [2], it includes the uniform distribution and normal/Gaussian distribution as special cases. The probability density function (PDF) is


In general, the Gaussian-q distribution is bounded and symmetric about zero, with the exception of the limiting case of the normal distribution.


[1] Nahlaa. B. Some properties of q-Gaussian distributions. Retrieved April 22, 2023 from:

[2] Diaz, R. (2008). On the Gaussian q-Distribution. Retrieved April 22, 2023 from:

Gauss-Kuzmin distribution

The Gauss-Kuzmin-distribution concerns the frequency of a positive integer k in a continued fraction is equal to

gauss kuzmin distribution

for nearly all real numbers. For example, for a general real x, we have [1]:

  • 42% of 1
  • 17% of 2
  • 2.9% of 3…

Exceptions include [2]

  • Rational numbers (for large n)
  • Quadratic irrationals (quadratic numbers that are also irrational)
  • Bounded continuous fractions.


[1] Karpenkov, G. (2013). Geometry of Continued Fractions. Springer.

[2] Duff. S. Statistics of the Gauss-Kuzmin distribution. Retrieved April 22, 2023 from:

Kullback distribution

I love a good mystery. I found a reference to the Kullbach distribution (with a c) while researching unusual probability distributions. Of course, the first thing I did was to perform a Google search for “Kullbach distribution”. I got exactly one result from Haight [1]

So, what exactly is this mysterious distribution? A delve into the Index to the Distributions of Mathematical Statistics (IDMS) [1] provides a further clue:

The notation GM stands for geometric mean, and Type III is a skewed distribution similar to the binomial distribution [2], which does have the gamma function in the denominator.

The entry “[2]251” refers to page 251 of Kendall’s The Advanced Theory of Statistics , Volume 1 [3]:

But, page 251 of Kendall’s work does not have the “Kullbach distribution” listed, nor is there an entry for “Kullbach” in the back index.

The penny started to drop. This was a typo. A look at a later edition of the Index to the Distributions revealed that the spelling should be Kullback distribution (with a k).

But, what is the “Kullback Distribution?”

Solomon Kullback wrote about relative entropy in 1951 [4], where he described it as

“the mean information for discrimination between H1 and H2 per observation from μ1

The asymmetric “directed divergence” (a distance metric) is now called the Kullback–Leibler divergence.

I did find reference to the formula listed in IDMS in Kullback’s 1931 paper An Application of Characteristic Functions to the Distribution Problem of Statistics [5] under a section titled “distribution of the geometric mean”, which is obtained from that of the arithmetic mean μ by the transformation of μ = log gn .

kullbach distribution
Kullback’s 1931 formula.

Mystery solved!


[1] Haight, F. (1958). Index to the Distributions of Mathematical Statistics. National Bureau of Standards Report.

[2] Abramowitz, M. and Stegun, I. A. (Eds.). (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing. New York: Dover.

[3] KENDALL, M. G. The Advanced Theory of Statistics , Volume 1 , London: Charles Griffin and Co., 1945,

[4] Kullback, S.; Leibler, R.A. (1951). “On information and sufficiency”. Annals of Mathematical Statistics22 (1): 79–86. doi:10.1214/aoms/1177729694. JSTOR 2236703. MR 0039968.

[5] Kullback, S. (1934). An Application of Characteristic Functions to the Distribution Problem of Statistics. The Annals of Mathematical Statistics5(4), 263–307. p.277.

Marshall–Olkin bivariate distribution

The Marshall–Olkin bivariate distribution, introduced by Albert W. Marshall and Ingram Olkin [1], is a family of continuous multivariate probability distributions that is an extension of the bivariate family of distributions with an extra shape parameter α.

It is defined by the proper survival function


Where (X, Y) is a random vector with joint survival function F(x, y).

Uses of the Marshall–Olkin bivariate distribution

The Marshall–Olkin bivariate distribution gives a wider range of behavior than the bivariate family of distributions. Specifically, the extra parameter α can model real-life situations better than the basic model [2]. In the classical Marshall-Olkin model, two components are subjected to random shocks from three different sources. Many extensions of the basic distribution have been proposed, including:

  • Ryu [3], who extended the basic model to a bivariate absolutely continuous distribution, which does not have the Marshall-Olkin lack of memory property.
  • Aly & Abuelamayem [4], who developed the Multivariate Inverted Kumaraswamy Distribution: Derivation and Estimation as a new Marshall–Olkin bivariate distribution for efficient application in several fields.  Parameters are found with both maximum likelihood and Bayesian approaches, which the authors state could be applied to all Marshall–Olkin multivariate distributions.

Thus, the Marshall–Olkin bivariate distribution is one member of a wider family of generalized Marshall–Olkin distributions. It is the most popular of all the bivariate lifetime distributions [4].


[1] Marshall, A. N., Olkin, I., (1997) A new method for adding a parameter to a family of distributions with applications to the exponential and Weibull families, Biometrica 84, 641-652.

[2] Jose, K.K. (2011). Marshall-Olkin Family of Distributions and their applications in reliability theory, time series modeling and stress-strength analysis. Int. Statistical Inst.: Proc. 58th World Statistical Congress, Dublin (Session CPS005) p.3918

[3] Ryu, K. An extension of Marshall and Olkin bivariate exponential distribution. J. Amer. Statist. Assoc., 88 (1993), pp. 1458-1465

[4] Aly, H. & Abuelamayem, O. (2020). Multivariate Inverted Kumaraswamy Distribution: Derivation and Estimation. Mathematical Problems in Engineering.

Tessier distribution

The Tessier distribution, named after biologist Georges Tessier, Muth [15] has a heavier tail than more well known lifetime distributions such as the gamma distribution, lognormal distribution, and Weibull distributions [1].

The Tessier distribution has the probability density function (PDF) of [2]

fY(y; θ) = θ(eθy – 1) exp (θyeθy + 1), y > 0

and cumulative distribution function (CDF) of

fY(y; θ) = 1- exp (θyeθy + 1), y > 0.


[1] Muth, E.J. Reliability models with positive memory derived from the mean residual life function. Theory Appl. Reliab. 1977, 2, 401–436.

[2] Teissier, G. Recherches sur le vieillissement et sur les lois de la mortalité. Ann. Physiol. Physicochim. Biol. 1934, 10, 237–284

Uniantimodal distribution

The term uniantomodal means “concave up” (or cup-shaped) when displayed on a graph. Thus, a uniantomodal distribution is any distribution that takes on a concave up shape.

As an example, the Kumaraswamy distribution is an uniantimodal distribution when a, b < 1 as shown in the following image in red:

The Kumaraswamy distribution PDF showing a uniantimodal shape (red).

Scroll to Top