Skewed Distribution

< Probability distributions

A skewed distribution has one tail that is longer than the other. Skewed distributions have more outliers or extreme values on one side and are sometimes called asymmetrical distributions as they don’t show any symmetry.

central tendency and skewed distribution graph
Distributions can be skewed left (negative skew) or skewed right (positive skew) [1].

The location of the long tail defines the skew of a distribution:

  • A negative skewed distribution has a long tail on the negative direction of a number line. It’s also sometimes called a left-skewed distribution because its long tail is on the left. In these distributions, probabilities grow slowly then taper off quickly. Thus, it is a distribution with only a few extremely large values.
  • A positive skewed distribution has a long tail on the positive direction on a number line. It’s also sometimes called a right-skewed distribution because it has a long right tail. In these distributions, the probabilities grow quickly and taper off slowly for high values. I has only a few extremely small values.

In comparison, a symmetrical distribution has two halves exactly the same. For example, any bell-shaped distribution is also a symmetric distribution.

shapes of probability distributions
A bell shaped probability distribution [2].

Many types of probability distributions are skewed, including:

Skewed distribution on histograms and boxplots

You can identify skewed distributions on histograms by looking at the distribution’s shape. In a histogram, a skewed distribution will have a long tail on one side.

  • Positive (right) skewed: A histogram with positive skewness has a long tail on the right.
  • Negative (left) skewed: A histogram with negative skewness has a long tail on the left.
skewed histograms
A positive (right) skewed histogram and a negative (left) skewed distribution [3].

In a boxplot, a skewed distribution has a median that is not in the center of the box. Boxplots can be horizontal or vertical, so it’s generally easier to think of these as positive or negative skewed (rather than left or right skewed), because if the boxplot is vertical (as in the image below) the tails of the boxplot will not go left or right (they go up and down)!

  • Positive (left) skewed: A boxplot with positive skewness has a median line that is pushed towards lower values (the long tail extends toward higher values).
  • Negative (right) skewed: A boxplot with negative skewness has a median line that is pushed towards higher values (the long tail extends toward lower values).
skewed boxplots
A skewed boxplot has a median (shown in red) that is off-center [4].

Skewed Distributions and the Mean, Median, and Mode

mean median mode in a skewed distribution

In a normal distribution, the mean, median, and mode are all equal. However, in a skewed distribution, the mean, median, and mode are usually different. The mean is especially sensitive to skewed data, so it’s exact location is hard to predict, except relative to the median and mode.

  • Positively skewed distributions have longer tails on the right side of the distribution, which results in more values that are greater than the mean. The mean of a positively skewed distribution is greater than the median and the mode.
  • Negatively skewed distributions have longer tails on the left side of the distribution, so there there are more values that are less than the mean. The mean of a negatively skewed distribution is less than the median and the mode.

These facts result in an important rule of thumb that you can use to find out if a particular distribution is positively or negatively skewed:

  • If the mean is greater than the mode or less than the median (or both), the distribution is positively skewed.
  • If the mean is less than the mode or less than the median (or both) the distribution is negatively skewed.

Some distributions are more skewed than others. Although this doesn’t change the rule of thumb, it does affect the usefulness of the mean as a measure of centrality. With a slightly skewed distribution, the mean may not be affected much, but with a heavily skewed distribution, the mean becomes a very poor measure of central tendency, as the following image shows:

A slightly skewed distribution has a mean that is reflective of the distribution’s center. In comparison, a highly skewed distribution (dashed lines) has a mean that is far from center [5].

If a distribution is heavily skewed, the median is more accurate as a measure of the center of the distribution [6].

Skewed distributions in real life

While the normal distribution is ubiquitous in elementary statistics classes, you’re much more likely to come across skewed distributions in real life.

Here are some examples of right-skewed distributions in real life:

  • Wealth distribution: In many societies, a small number of people hold substantial wealth, while the majority have much less. This results in a right-skewed distribution with a long right tail.
  • Income distribution: This follows a pattern similar to wealth distribution, with a few people earning significantly more than the rest.
  • House price distribution: In most markets, a small number of houses are highly priced, while the majority are less so, resulting in a right-skewed distribution.
  • Salary distribution: Within most organizations, a handful of employees earn much more than their colleagues, creating a right-skewed salary distribution.
  • Test score distribution: In most tests, few people score As, while the majority scores less impressively, often leading to a right-skewed distribution.

How are skewed distributions analyzed?

There are several ways you can analyze skewed data.:

  • One approach is to use the median as the measure of central tendency, as it is more robust than the mean. The median is the middle value in a distribution when the values are arranged from smallest to largest. This quality makes it resistant to outliers, which are data points significantly distant from the rest of the data.
  • An alternative method to analyze a skewed distribution is with the interquartile range (IQR). The IQR is one measure of variability and is not affected by outliers because it only takes into account the middle 50% of data. It’s calculated as the difference between the 75th percentile and the 25th percentile.
  • Transformations can also be applied to analyze skewed data. Transformations are mathematical operations that alter the scale or shape of a distribution. Common transformations for skewed data include:
    • Logarithmic transformation: Often applied to data skewed to the right.
    • Exponential transformation: Typically applied to data skewed to the left.
    • Box-Cox transformation: A more comprehensive transformation applicable to data skewed in either direction.

The best method to analyze skewed data will depend on the specific dataset and the objectives of the analysis. In some cases, using the median or the IQR may be enough. In others, a transformation might be required.

Skewed distributions can be quantified a measure called skewness:

  • A value of 0 means that the distribution is symmetric.
  • A value greater than 0 means that the distribution is positively skewed.
  • A value less than 0 means that the distribution is negatively skewed.

References

[1] (Godot) at en.wikipedia., CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[2] LawrenceSeminarioRomero, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[3] Visnut, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[4] Ever.chae, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

[5] Cmglee, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

[6] Central Tendency & Variability

Scroll to Top