< List of probability distributions

An **extreme value distribution (EVD)** is a mathematical model used to predict the maximum and minimum values of a data set. This type of distribution can prove to be incredibly useful in situations like predicting the failure point of a levee or other infrastructure, or in understanding the variability of a certain process.

The terms *Fisher-Tippet distribution* and *extreme value distribution* are often used interchangeably to refer to the same family of distributions. Both terms can be used to describe the distribution of the maximum value of a sample of independent and identically distributed random variables as the sample size grows large. The term “extreme value distribution” is perhaps more commonly used in the literature, especially in the context of extreme value theory, whereas the term “Fisher-Tippet distribution” is sometimes used in the statistical literature to refer to the same family of distributions.

## Extreme value definition

An extreme value refers to either exceptionally small or large values within a probability distribution. These values are typically located in the upper or lower tails of a probability distribution, which represent the distribution’s farthest points.

The definition of “extreme value” may vary slightly depending on the context. Some authors use the term as an alternative name for the minimum and/or maximum value of a function (i.e., the single smallest and/or largest number in the set), while others use it as a synonym for an outlier. In calculus, the points where maximum and minimum values are found are called extrema, leading some authors to refer to these points as “extreme values” as well. However, in most instances, when discussing extreme values, people generally refer to values associated with the Extreme Value Theory.

## What is an extreme value distribution?

Have you ever wondered how scientists can predict the maximum and minimum values of a data set? It turns out they can use the EVD; It’s designed to give us a clear picture of how large or small our data is likely to get, so we can plan accordingly. For example, imagine you’re in charge of building a levee to protect against storm surges. By analyzing historical storm data, you can use the extreme value distribution to predict when the waves are likely to get too big and when the levee is likely to fail. Think of it like a warning sign – a point of no return where the levee is in danger of failing.

The idea is that three types of extreme value distributions can model the extremes from *any *set of data, as long as the distribution is

- “well-behaved” [2] and
- Is a continuous probability distribution and
- An inverse probability distribution exists and
- Is composed of independent and identically distributed (iid) variables.

## Types of extreme value distribution

There are three types of EVDs that are commonly used to model extremes from data sets, namely Type I, II, and III [1]. Each type works differently but serves the same purpose — to find the most likely failure point for any given system. They each have different assumptions about how the data is distributed and thus differ in their uses.

## Type I EVD

**Type I EVD** (The Gumbel distribution) is skewed towards one side with long tails on both ends. This type is well suited for situations where there is no clear trend in the data and where outliers can occur due to randomness or unexpected events. It also works well when trying to find an upper limit for some process.

The Gumbel distribution has two formulas: one for the minimum and one for the maximum. The probability density function (PDF) for the minimum is [3]

where μ is the location parameter and β is the scale parameter.

and for the maximum, it is

Type I is the most common EVD and is unbounded (not restricted to a range) and defined on the entire range of real numbers. The probability density function only has only one shape, which shifts according to the location parameter, μ. As μ increases, the distribution shifts to the left; As μ decreases, it shifts to the right. Let’s say you had a list of minimum pollution levels for the last decade; you could use the EVD Type I to model minimum pollution levels for the next year.

## Type II EVD

**Type II EVD** (Fréchet Distribution) is symmetric around its mean value (μ). This type is best used when trying to understand variability within a certain process since it does not account for outliers as well as Type I does. It can be used for both finding upper and lower limits for a process depending on what you want to know about it.

The Fréchet slowly converges to 1 and has three parameters: shape parameter, α, scale parameter, β and location parameter μ. It is defined on the interval μ ∞; In other words, it is bounded (restricted) on the lower side. A wide range of phenomena like flood analysis, horse racing, human lifespans, maximum rainfalls and river discharges in hydrology can be modeled with the Fréchet.

## Type III EVD

Finally, **Type III EVD **(Weibull distribution) has long tails on both sides near its mean value but also has smaller variations near its middle value. This type is best suited for finding an estimated lower limit of some process since it accounts best for potential outliers among low values in the dataset.

The probability density function is

The Type III distribution is used in assessing product reliability to model failure times and life data analysis. The Weibull is a family of distributions that can take on many shapes, depending on what parameters you pick. It includes two exponential distributions, a right-skewed distribution and a symmetric distribution.

## Generalized extreme value distribution

The GEV distribution is a versatile and powerful tool that unites the three extreme value distributions: the Gumbel, Fréchet, and Weibull. Tail behavior is crucial in choosing which distribution model to use, but not knowing the parent population’s tail makes it difficult to pick the best one. The GEV combines all three distributions into one convenient package, making it easier to analyze data and make informed decisions. With the GEV distribution, you can tackle extreme value problems with confidence and accuracy.

The cumulative distribution function (CDF) for the generalized EVD is

where

- μ = location parameter,
- σ = scale parameter (>0),
- ξ = shape parameter.

1 + ξ(x-μ)/σ must be greater than zero.

- When the shape parameter ξ is equal to 0, the GEV is equal to EVD Type I.
- When it is greater than 0, the GEV is equal to EVD Type II.
- When ξ is less than 0, the GEV is equal to EVD Type III.

The GEV distribution is also called the **Fisher–Tippett distribution**, after Ronald Fisher and L. H. C. Tippett. However, this can cause confusion because the special case of the Gumbel distribution is also called the Fisher-Tippet distribution. To avoid confusion, it’s best to refer to the distribution that encompasses all three types (EVD I,II & III) as *the Generalized extreme value distribution*.

In short, extreme value distributions can be incredibly useful tools in understanding the maximums and minimums of any given dataset. They work by assuming certain distributions (exponential, normal, log-normal) of the data points so that they can accurately estimate where failure points might occur if they were exceeded. Depending on what you’re looking to achieve with your dataset, you may choose one type over another as each has its own advantages and limitations based on its assumptions about how your data is distributed throughout your dataset. With some knowledge about these three types of EVDs, you will be well prepared to use them in various settings!

## References

[1] No machine-readable author provided. Anarkman~commonswiki assumed (based on copyright claims)., CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

[2] Gumbel (1958). Statistics of Extremes.

[3] NIST. Extreme Value Type I Distribution. Retrieved April 9, 2023 from: itl.nist.gov/div898/handbook/eda/section3/eda366g.htm