< List of probability distributions < Factorial distribution
What Is a Factorial Distribution?
The factorial distribution is a distribution for which successive frequencies are factorial qualities. It can also be defined as a distribution that happens when variables are independent events.
1. The factorial distribution as factorial qualities
Irwin  defines the factorial distribution (also called the inverse factorial series distribution) as a distribution that occurs when successive frequencies are factorial qualities, with the form
where x = 1, x[r] = x(x + 1) … (x + r – 1) denotes the ascending factorial (note: x does not appear in the general factorial distribution distribution, because it can be used to model any factorial distribution, regardless of the value of x).
2. A factorial distribution as independent events
A factorial distribution is one of the simplest probability distributions, because the variables don’t interact at all. It can be used to determine the probability of multiple events occurring at once or consecutively, and it can be written in many different ways.
This type of distribution happens when a set of variables are independent events. This means that the variables don’t interact at all; given two events x and y, the probability of x doesn’t change when you factor in y. For example, if event x is a coin toss and event y is choosing a card from a deck, those events don’t interact and so are independent.
Therefore, the probability of x given that y has happened, P(x | y), will be the same as the probability of x, written as p(x). This type of distribution allows us to calculate probability based on certain factors without having to consider other variables or factors.
The factorial distribution can be written in many ways [4, 5]:
- p(x, y) = p(x) p(y)
- p(x, y ,z) = p(x) p(y) p(z)
- p(x1, x2, x3, x4) = p(x1) p(x2) p(x3) p(x4)
- P(x) = ΣP(x | y) * P(y)
Note that none of these terms include a factorial (!) symbol; that’s because the factorial distribution doesn’t contain any factorials per se; it is named because successive frequencies are factorial quantities. Factorials (!) are products of whole numbers up to the number of interest. For example, 3! (read “three factorial”) equals 3 * 2 * 1 = 6.
The equation P(x) = ΣP(x | y) * P(y) states that the total probability of event x happening is equal to the sum of all probabilities for each separate event multiplied together. For example, if you want to find out the probability that two people out of five will get sick from eating contaminated food, you would use this equation to determine your answer. The total probability would be 0.25 because each individual has a 0.5 chance of getting sick (assuming everyone has an equal chance).
A more general way of writing the factorial distribution for three or more variables is 
P(x1, x2, … ,xn) = P(x1) · P(x2 · …· P(xn) = P(x1, x2, … xn)= Πi P(xi).
The Π (uppercase pi) symbol is the product operator, which is used for multiplication in the same way that the uppercase sigma (Σ) symbol is used for summation.
Calculating a factorial distribution
Calculating a factorial distribution requires some basic knowledge of statistics and probability theory. You need to understand how independent events interact with one another and how they affect each other’s probabilities. Once you have these concepts down, you can use them to calculate any number of scenarios involving independent events. To do so:
- Start by writing out the separate probabilities for each individual event (P(x) and P(y)).
- Then multiply those numbers together and add them up: Σ P (x |y) * P(y).
- Finally, divide your result by 1 minus whatever number results from subtracting your original probabilities (1- [P(x)-P(y)]). This will give you your final answer—the likelihood that both events will occur simultaneously or sequentially.
Application example: wake-sleep algorithm
One application is in the wake-sleep algorithm in machine learning (a stack of layers that represents data); the probability of a whole vector is the product of its individual terms . For example, lets say that you have three probabilities of hidden units in a layer:
0.3; 0.6; 0.8.
The probability that these units have a state 1, 1, 1 if the distribution is factorial is
p(1, 1, 1) = 0.3 * 0.6 * 0.8
Similarly, The probability that these units have a state 1, 0, 1 is
p(1, 0, 1) = 0.3 * (1 – 0.6) * 0.8.
 Image: Kamil.ryszkowski, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons
 Irwin, J. (1963). The place of mathematics and biological statistics. Journal of the Royal Statistical Society. Series A, 126, 1-45.
 Dodge, Y. (Ed.) (2003). The Oxford Dictionary of Statistical Terms. Oxford University Press.
 Hinton, G. (2013). Lecture 1: Introduction to Machine Learning and Graphical Models. Retrieved December 28, 2017 from: https://www.cs.toronto.edu/~hinton/csc2535/notes/lec1new.pdf
 Olshausen, B. (2004). A Probability Primer. Retrieved December 27, 2017 from:
Retrieved from http://redwood.berkeley.edu/bruno/npb163/probability.pdf
 Hinton, G. et al. Neural Networks for Machine Learning.