Zipf distribution / Zeta distribution

< List of probability distributions

The Zipf distribution, also called the zeta distribution, is a discrete probability distribution that follows a power-law relationship. It is named after the American linguist , who observed this distribution in various natural and social phenomena.

The Zipf distribution is commonly used to model scenarios where a few events occur very frequently, while many other events occur less frequently. The distribution is closely related to other power-law distributions, such as the Pareto distribution and the Yule-Simon distribution, and may be used to model similar phenomena such as word frequency, internet traffic, and population counts.

Properties of the Zipf distribution

X ∼ Zipf(α, n) indicates a random variable X has a Zipf distribution with parameters α and n.

The probability density function (PDF) for the Zipf distribution is [1]

where

  • ρ is a positive integer
  • ζ is the Riemann zeta function.

The cumulative distribution function (CDF) is

where

The Zipf distribution is a special case of the Zipfian distribution.

History and Alternate names

The Zipf distribution goes by many other names, including:

  • Discrete Pareto Distribution,
  • Joos Model,
  • Riemann Zeta Distribution,

Zipf’s Law, which forms the basis of the Zipf distribution, states that the frequency of an event is inversely proportional to its rank in a frequency-ordered list. In other words, the second most frequent event occurs half as often as the most frequent event, the third most frequent event occurs one-third as often, and so on. Although the law results in the distribution, it is not another name for the actual distribution itself.

George Kingsley Zipf (1902-1950) was an American linguist and philologist who specialized in statistical analyses of language. He earned his Ph.D. from Harvard University and served as a professor of German and Linguistics at the same institution. Zipf is best known for his discovery of Zipf’s Law.

Zipf came across this distribution while studying large samples of written texts in various languages. He observed that the frequency of a word is inversely proportional to its rank in the frequency table. He published his findings in the book Human Behavior and the Principle of Least Effort [2], which presented a wide range of applications of Zipf’s Law.

The term “Zipf distribution” is named after George Kingsley Zipf himself, in recognition of his discovery and extensive work on the subject. The distribution is sometimes referred to as the “zeta distribution” because its PMF involves the Riemann zeta function, a mathematical function that plays a crucial role in number theory and complex analysis. The zeta function is used to normalize the distribution, ensuring that the probabilities sum up to one.

Zipf distribution uses

The Zipf distribution is commonly used to model scenarios where a few events occur very frequently, while many other events occur less frequently. Some well-known applications of the Zipf distribution include:

  1. Linguistics: Zipf’s Law has been observed in the distribution of word frequencies in natural languages, where a small number of words are used very frequently, while the majority of words are used rarely.
  2. City populations: The distribution of city populations often follows Zipf’s Law, with a few large cities and many smaller cities [3].
  3. Internet traffic: The distribution of website visits and file downloads on the internet can also exhibit Zipfian patterns [4].

References

[1] Weisstein, Eric W. “Zipf Distribution.” From MathWorld–A Wolfram Web Resource. https://mathworld.wolfram.com/ZipfDistribution.html

[2] Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley Press.

[3] Zipf’s Law for Cities – A Simple Explanation for Urban Populations

[4] Adamic, L. & Huberman, B. Zipf’s law and the internet. Glottometrics 3, 2002,143-150.

Scroll to Top