Tweedie distribution

< List of probability distributions < Tweedie distribution

tweedie distribution example
Data with a cluster at zero may be a good candidate for a fit to a Tweedie distribution.

What is the Tweedie distribution?

Have you ever seen a histogram with a spike at zero? This phenomenon — caused by a cluster of data points at zero — is called a “point mass” and can often be modeled by fitting data to a Tweedie distribution.

The Tweedie distribution [1] is a special case of an exponential distribution; it has many useful applications, particularly in the insurance industry, medical/genomic testing, or anywhere else there is a mixture of zeros and non-negative data points.

About the Tweedie Distribution

The Tweedie distribution is a family of distributions that are a subset of Exponential Dispersion Models (EDMs), two-parameter distributions from the linear exponential family that have a scale parameter of φ.

This family of distributions has the following characteristics:

The p in the variance is an additional shape parameter; it is sometimes written in terms of the shape parameter α:
p = (α – 2) / (α -1).

The distribution is not defined for values of p from 0 to 1.

Special cases of the Tweedie distribution include:

Fitting Data to a Tweedie Distribution

The probability density function (PDF) for the Tweedie distribution is complex and cannot be expressed in a simple closed form. However, it is sometimes expressed as a series of functions. As the distribution mimics other distributions for some values of “p”, you can use the PDF for those functions. For example, if p = 3, use the PDF for the inverse Gaussian.

The most basic way to fit data to a Tweedie distribution is through maximum likelihood estimation (MLE). MLE requires you to define several parameters in order to get an accurate fit. These include the mean, standard deviation, skewness, kurtosis, power parameter (p) and dispersion parameter (δ). It also requires that you specify which type of Tweedie model you wish to use (e.g., Poisson or Gamma). Once these parameters are specified, MLE calculates the probability of each observation given these parameters. The observations with the highest probabilities will be those that best fit the data set.

Analyzing Results

Once you have fit your data set to a Tweedie model and calculated the probabilities for each observation, there are several ways in which you can analyze your results. You can calculate summary statistics such as mean, median, mode and range; compute confidence intervals; perform hypothesis tests; create predictive models; generate frequency tables; or even create graphs such as box plots or histograms. All of these tools can help you gain deeper insights into your data set and draw meaningful conclusions from it.

YouTube video showing how to use an open source Python package to model Tweedie distributions.

References

[1] Tweedie, M. C. K. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Eds. J. K. Ghosh and J. Roy), pp. 579-604. Calcutta: Indian Statistical Institute.

Scroll to Top