Prior Distribution

< Probability distributions < Prior distribution

A prior distribution, a key part of Bayesian inference, represents your belief about the true value of a parameter, in essence it is your “best guess.” It can be thought of as the process of assigning a prior probability distribution to a parameter, which represents your degree of belief concerning that parameter [1].

Once you have gathered some observations, you can update the prior distribution with new evidence to get the posterior distribution, which is used to make future inferences and decisions involving the parameter in question. The new evidence is summarized with a likelihood function, so:

Posterior Distribution = Prior Distribution + Likelihood Function

Creating a prior distribution

When practically possible, prior and posterior distributions are given in terms of known densities, such as the normal distribution,
binomial distribution or gamma distribution [2].

prior distribution Density function for Jeffreys prior.
Density function for Jeffreys prior.

To create a prior distribution, you have various options. For example, you can make an informed estimate and define a probability distribution with descriptive parameters such as the mean or standard deviation. If you lack such information about the distribution, you can opt for an uninformative prior, also called Jeffreys Prior. This approach involves selecting a distribution at random. More specifically, parameter “guesses” are called hyperparameters—estimated parameters that are independent of observed data. Hyperparameters reflect your prior beliefs before observing any data [3].

Despite the potential unreasonableness of the model, making observations will lead to an improved posterior distribution compared to the initial guess. Notably, an uninformative prior has a minimal impact on the posterior distribution relative to a known prior. Uninformative priors work well in many situations, such as calculating variance parameters using a noninformative uniform prior distribution.

Uninformative priors aren’t always necessary because many scientific fields have precise information about model parameters from previous medical studies, such as the parameter that best represents the liver as a fraction of body mass [4]. Thus, whether priors are uninformative or highly informative will largely depend on your field of study.

While calculating a prior distribution is relatively straightforward, transforming it into a posterior distribution poses a challenge. Incorporating prior beliefs into a probability distribution is not as simple as it may seem. One option is to use the Metropolis-Hastings algorithm, which can approximate a posterior distribution (e.g., create a histogram) based on the prior distribution and observed samples.

Is there a true prior?

Whether or not a “true” prior distribution exists is up for debate. The traditional viewpoint is that the prior distribution represents a state of knowledge or a subjective state of mind, thus there cannot be a true prior by definition. A prior distribution can also be thought of as an expression of beliefs; as different people can have different beliefs, they will have different “true” priors. However, some statisticians disagree with this viewpoint, even going to far as the say that a true prior exists and has a frequentist interpretation [5].

According to Gelman [5], two benefits are gained by thinking of “true priors” instead of priors as a subjective choice:

  1. It establishes a link with frequentist statistics, which offers valuable insights into understanding statistical methods through their average properties. This connection allows us to leverage Bayesian methods effectively. While it is unreasonable to expect a procedure to yield the correct answer given the true unknown parameter value, aiming for accurate results by averaging over problems to which the model will be fitted is entirely reasonable.
  2. The connection to hierarchical models is noteworthy. In many scenarios, we can view a parameter of interest as part of a group or batch, as exemplified by the recent discussions on modeling multiple potential paths concurrently. In such cases, the true prior corresponds to the distribution of all the underlying effects in consideration.


[1] Chapter 5. Bayesian Statistics


[3] Riggelsen, C. (2008). Approximation Methods for Efficient Learning of Bayesian Networks. IOS Press.

[4] Gelman, A. (2002). Prior distribution. Wiley. Retrieved July 16, 2023 from:

[5] Geller, A. (2016). What is the “true prior distribution”? A hard-nosed answer. Retrieved July 16, 2023 from:

Scroll to Top