< Statistics and probability definitions

**Variance **measures the spread between numbers in a data set. It helps us determine how far each number in the set is from the mean or average, and from every other number in the set. It is calculated by taking the **average of the squared differences from the mean.**

## How to calculate the variance

## 1. By hand

To calculate the variance for a population:

where:

- σ
^{2}is the population variance - ∑ is summation notation (i.e., add them all up!)
- X is a data point
- μ is the population mean
- N is the population size.

To work the formula:

- Find the mean (average).
- Subtract the mean from each X value in the dataset and square the result. Squaring is done to ensure that negative numbers become positive. The focus is on the distance from the mean, not the sign.
- Average the squared differences. In other words, add up all the results from step 2 and divide by the population size N.

**Example**: find the population variance for 28, 29, 30, 31, 32.

- Find the mean: (28 + 29 + 30 + 31 + 32) / 5 = 30.
- Subtract the mean from each x-value:
- 28 – 30 = -2
- 29 – 30 = -1
- 30 – 30 = 0
- 31 – 30 = 1
- 32 – 30 = 2

- Square the values from Step 2:
- -2
^{2}= 4 - -1
^{2}= 1 - 0
^{2}= 0 - 1
^{2}= 1 - 2
^{2}= 4

- -2
- Sum the numbers from Step 3: 4 + 1 + 0 + 1 + 4 = 10
- Divide by the number of items in your data set:

10 / 5 =**2.**

To calculate the variance for a sample:

where

- s
^{2}is the sample variance - ∑ is the sum (i.e., add them all up!)
- X is a data point
- x̄ is the sample mean
- n is the sample size

- Find the mean (average).
- Subtract the mean from each value in the dataset and square the result.
- Average the squared differences: to do this, add up all the results from step 2 and divide by the sample size n – 1.

**Why n – 1? **The

*n*-1 in the sample variance formula is a correction factor to adjust the result because the sample mean tends to underestimate the population mean.

## 2. With the TI-83

There isn’t a variance option on the TI-83. To find the variance, find the standard deviation first, and then square the result.

**Example**: Find the variance for the heights of 12 tall buildings (in feet): 800, 720, 655, 655, 625, 600, 590, 529, 513, 502, 502, 502.

- Enter the above data into a list. Press the STAT button, then press ENTER. Type in the numbers, pressing ENTER after each entry.
- Press STAT.
- Use the right arrow button to select Calc.
- Press ENTER to highlight 1-Var Stats.
- Press ENTER again to display a list of statistics.
- Press VARS 5 to see the available statistics variables.
- Press 3 to select “Sx” which corresponds to the standard deviation.
- Press x
^{2}and then ENTER to show the result, which is 9326.628788.

Lost? Check out this short video by Prof. Essa on YouTube.

## 3. With an online variance calculator

The following calculator can calculate the variance for a sample or population:

**Tip**: if you’re just beginning to study statistics, keep the default options: **Data type** = continuous and **Type **= sample. You’ll need the other options when calculating proportions, or in the rare case you need to calculate population variance.

## What is the difference between standard deviation and variance?

Standard deviation is the square root of variance. While variance. gives you a general idea of the spread of data, the standard deviation is more concrete, giving you exact distances from the mean. For example, if you had data from a normal distribution with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 – 10 = 40 and 50 +10 = 60 [1]. We know this because of the empirical rule (aka the 68-95-99 rule). The variance does not give you this information, which is why we use standard deviation instead of variance in statistical analysis.

## Is variance mean deviation?

No, variance is not the same as mean deviation.

- Mean deviation is expressed in the same units as the data. The formula is MD = ∑|x – μ| / n.
- Variance is measured in squared units. The formula is ∑(x – μ)
^{2}/ n

As mean deviation is expressed in the same units as the data, that makes it easier to interpret although it isn’t considered a reliable measure of variability. Mean deviation is sometimes used with variance to give a more comprehensive understanding of how a data set is distributed.

## What is the difference between variance and variation?

The terms “variance” and “variation” are often used interchangeably, but they mean different things:

**Variance**tells us how spread out a dataset is from the mean.**Variation**is a broader term to describe the amount of difference or change present in a dataset. It can be quantified using the variance, but also the standard deviation or range.

Variance is a specific form of variation, while variation is a more general concept that includes any type of difference or change in a dataset.

## What does “high variance” mean?

A high variance indicates that the data points in a set are widely spread out, with a significant difference between the mean and the standard deviation. Put simply, the data points are not concentrated around the mean.

## Variance advantages and disadvantages

**Advantages:**

- Relatively easy to understand.
- Treats all deviations from the mean as the same regardless of their direction.
- Versatile measure that can be used in a variety of settings.

**Disadvantages:**

- Not as interpretable as the standard deviation.
- Measured in squared units, which can make it difficult to compare different data sets.
- Can be affected by the number of data points, which can make it difficult to compare data sets with different sizes.

The variance is sensitive to outliers, which can be useful to identify unusual data points. But this can also be a disadvantage because squaring numbers can skew the data.

## References

[1] Maricopa Community College. 5. Chapter 5: Measures of Dispersion