< Statistics and probability definitions < Sample mean
Sample mean definition
In statistics, the sample mean is an average of a set of data — data that is sampled from a larger population. This measure of central tendency can be used to calculate the standard deviation and variance of a data set. The sample mean can also be applied to determine population averages.
The main difference between a sample mean and a “regular” mean is that the data is calculated from a sample. A sample represents a small portion of a larger whole. For instance, if you work for a polling company and want to determine how much individuals spend on food annually, it would be impractical to survey every person in the United States. Instead, you can select a smaller fraction of that population (perhaps 250 people) to represent the sample. The term “mean” is another name for the average. In this case, the sample mean would be the average amount that these 250 people spend on food each year.
The sample mean is valuable because it enables estimation of the broader population’s behavior without having to survey everyone. If, for example, the sample mean for annual food spending was $6000, it is likely that a very similar figure would be obtained if every person in the population were surveyed. Thus, the sample mean serves as a time and cost-effective method for approximating population trends.
The sample mean symbol is x̄, pronounced “x bar”.
How to calculate the sample mean
To find the sample mean, add up the number of items in a sample set and then divide that sum by the number of items in the sample set. For example, if you have a set of data with 10 items, you would add those items together and then divide by 10. The result would be your sample mean. This is the same math that you would use to find an “average”.
The sample mean formula is:
x̄ = ( Σ xi ) / n
- x̄ = sample mean
- Σ = summation or “add up”
- xi = the x-values or data points
- n = number of items in the sample.
Example: Calculate the sample mean of 12, 13, 14, 16, 17, 40, 43, 55, 56, 67, 78, 78, 79, 80, 81, 90, 99, 101, 102, 304, 306, 400, 401, 403, 404, 405.
- Sum the numbers (add them up): 12 + 13 + 14 + 16 + 17 + 40 + 43 + 55 + 56 + 67 + 78 + 78 + 79 + 80 + 81 + 90 + 99 + 101 + 102 + 304 + 306 + 400 + 401 + 403 + 404 + 405 = 3744.
- Count the number of items in your data set. In this specific data set, there are 26 items.
- Divide the number obtained in Step 1 by the number found in Step 2. 3744/26 = 144.
The sample mean is only an estimate of the true population mean. To get a more accurate estimate, you would need to take a larger sample size. However, the larger your sample size, the more time and resources it will take to collect that data. Therefore, statisticians must strike a balance between accuracy and feasibility when determining sample sizes. In statistics, feasibility refers to the likelihood of successfully carrying out a study or experiment. It evaluates the probability of a study’s success by taking into account various factors, such as resource availability, time and effort required, and ethical considerations.
Variance and standard deviation of sample mean
Variance represents how far spread out each of the sample items are within a data set. You can use the sample mean in further calculations by finding the variance of the data sample.
Example: calculate the variance of data points 71, 89, 93, 95, 88, 78, and 95:
- Add up all of the numbers: 71 + 78 + 88 + 89 + 93 + 95 + 95 = 609
- Square the total, and then divide the number of items in the data set: 609 * 609 = 370881
370881 / 7 = 52983. This is the sample mean.
- Take the set of original numbers from step 1, square them individually, then add them all up: (71 x 71) + (78 x 78) + (88 x 88) + (89 x 89) + (93 x 93) + (95 x 95) + (95 x 95) = 53489
- Subtract the amount in step 2 from the amount in step 3:53489 – 52983 = 506
- Subtract 1 from the number of items in my data set: 7 – 1 = 6
- Divide the number in step 4 by the number in step 5:506 / 6 = 84.33333333333333
This is the Variance.
- Take the square root of the number from step 6 (the Variance),
√(84.33333333333333) = 9.183318209303941
This is the Standard Deviation.
The larger the variance / standard deviation, the more spread out from the mean the data is.
Variance of the sample distribution of the mean
The sampling distribution of the sample mean refers to the probability distribution of all possible sample means. For example, assume you have 1,000 people and you select samples of 5 people at a time, calculating their average height. If you continually take samples (i.e., repeat the sampling a thousand times), the mean of all your sample means will eventually:
- Equal the population mean (μ)
- Resemble a normal distribution curve.
The variance of this probability distribution provides insight into the data’s dispersion around the mean. A larger sample size results in the sample mean accurately representing the population mean. In other words, as the sample size (N) increases, the variance decreases. Ideally, when the sample mean equals the population mean, the variance is zero.
The formula to calculate the variance of the sampling distribution of the mean is: σ2M = σ2 / N,
- σ2M = variance of the sampling distribution of the sample mean
- σ2 = population variance
- N = sample size.
Example: If a random sample of size 19 is drawn from a population distribution with a standard deviation (α) of 20, what is the variance of the sampling distribution of the sample mean?
- Determine the variance. Variance is the square of the standard deviation, so: σ2 = 20² = 400.
- Divide the variance by the number of items in the sample. This sample contains 19 items, so: 400 / 19 = 21.05.
Standard error of the sample mean
The standard error of the mean for a sample is equal to the standard deviation of the sample. The distinction between standard error and standard deviation lies in the fact that standard deviations use population data (i.e., parameters), while standard errors use data from the sample. To compute the standard error for the sample mean, use the formula:
SE = s / √(n)
- SE is the standard error,
- s represents the standard deviation for the sample, and
- n is the number of items in the sample.
Example: Find the standard error for these heights (in cm): 170.5, 161, 160, 170, 150.5.
- Determine the mean (average) of the data set: (170.5 + 161 + 160 + 170 + 150.5) / 5 = 162.4.
- Calculate the deviation from the mean by subtracting each value from the mean obtained in Step 1: 170.5 – 162.4 = 8.1 161 – 162.4 = -1.4 160 – 162.4 = -2.4 170 – 162.4 = 7.6 150.5 – 162.4 = -11.9
- Square the numbers calculated in Step 2:
- 8.1² = 65.61 -1.4² = 1.96 -2.4² = 5.76 7.6² = 57.76 -11.9² = 141.61
- Sum the values calculated in Step 3: 65.61 + 1.96 + 5.76 + 57.76 + 141.61 = 272.7
- Divide the value found in Step 4 by your sample size minus 1. With five items in the sample, n – 1 = 4: 272.7 / 4 = 68.175.
- Calculate the square root of the value found in Step 5. This is your standard deviation. √(68.175) = 8.257
- Divide the value calculated in Step 6 by the square root of the sample size (in this example, the sample size is 5): 8.257 / √(5) = 8.257 / 2.236 = 3.693
Uses for Sample Means
There are many different uses for a sample mean. As mentioned before, this measure can be used to calculate central tendency, standard deviation and variance. Additionally, the sample mean can be used to predict future events or trends. For instance, the sample mean can be used for both scientific discovery as well as predicting future outcomes. For example:
- Education: Educational institutions apply the sample mean to estimate the average test score for a specific grade level. This information helps identify high-performing schools and allocate resources to those that require assistance.
- Finance: Banks use the sample mean to estimate the average credit score for specific loan types. This information guides lending decisions and evaluates the risk of default.
- Healthcare: Hospitals leverage the sample mean to determine the average length of stay for particular surgeries. This data enhances hospital efficiency and ensures patients receive proper care.
- Insurance: Insurance companies utilize the sample mean to estimate the average cost of claims for specific insurance types. This data helps establish premiums that balance fairness for both the insurer and the policyholder.
- Manufacturing: Manufacturers employ the sample mean to gauge the average quality of a product. This information aids in identifying potential issues within the manufacturing process and ensuring the product meets customer expectations.
- Retail: Retailers use the sample mean to approximate the average price of products. This data helps set competitive prices while ensuring profitability for the retailer.
Another common use for the sample mean is calculating population averages. Imagine you are trying to determine how much money the average person spends on groceries per month. You could survey 100 people and ask them how much they spend on groceries in a typical month. Then you would take all 100 responses and calculate the sample mean. This would give you an estimate of how much money the average person spends on groceries each month. However, it’s important to keep in mind that this number is only an estimate since you did not surveyed every single person in existence!
The sample mean is a valuable tool for making informed decisions across various fields. By understanding how to calculate and interpret the sample mean, you can enhance the quality of your work and make a positive impact in different sectors.
 Cmglee, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons