Sample standard deviation
A sample standard deviation refers to the standard deviation of sample rather than that of a population. Standard deviation is a statistical measure of variability that indicates the average amount that a set of numbers deviates from their mean. A higher standard deviation indicates values that tend to be further from the mean, while a lower standard deviation indicates that the values tend to be closer to the mean.
Sample vs. population
In the context of statistics, a population is an entire group of objects or observations. A statistical population does not have to be some group of people; it can consist of heights, weights, test scores, temperatures, and so on.
While a population represents an entire group of objects or observations, a sample is any smaller collection of said objects or observations taken from a population. Sampling is often used in statistical experiments because in many cases, it may not be practical or even possible to collect data for an entire population. For example, it may not be practical to collect weight data for all the students attending a large university. However, data can be collected from a sample of the students and statistical measures (including standard deviation) can be used to make inferences about the rest of the population based on the sample.
Sample standard deviation formula
The sample standard deviation formula is
where xi is the ith element of the sample, x is the sample mean, n is the sample size, and is the sum of squares (SS).
The sum of squares is the sum of the squared deviation scores and is worth noting because it is a component of a number of other statistical measures, not just standard deviation. A higher sum of squares value indicates a larger degree of variability while a lower value indicates that the data varies less relative to the mean.
Since data sets in experiments are typically large, statistical measures such as standard deviation are commonly computed using a calculator or computer. Just to demonstrate the use of the formula, a worked example is provided below.
Example
Find the standard deviation given a sample of the gasoline mileage (in miles per gallon) of new, 6-cylinder automobiles produced in a given year:
36, 32, 31.3, 30.5, 28.4, 27, 26.2, 24, 21, 18.6
The sample mean is:
The sum of squares is:
SS = | |
= | (36-27.5)2 + (32-27.5)2 + (31.3-27.5)2 |
+ (30.5-27.5)2 + (28.4-27.5)2 + (27-27.5)2 | |
+ (26.2-27.5)2 + (24 - 27.5)2 + (21-27.5)2 | |
+ (18.6-27.5)2 | |
= | 252.4 |
The sample standard deviation is:
Thus, the sample standard deviation is 5.3 miles per gallon.
The empirical rule
The empirical rule (also referred to as the 68-95-99.7 rule) states that for data that follows a normal distribution, almost all observed data will fall within 3 standard deviations of the mean. More specifically:
- Approximately 68% of observed data falls within 1 standard deviation of the mean (denoted μ ± σ).
- Approximately 95% of observed data falls within 2 standard deviations of the mean (denoted μ ± 2σ).
- Approximately 99.7% of observed data falls within 3 standard deviations of the mean (denoted μ ± 3σ).
Based on the example above, the empirical rule can be used to forecast that given a sample of 6-cylinder automobiles of the specific year:
- 68% of the gas mileages will fall within the range of 22.2 and 32.8 mpg.
- 95% of the gas mileages will fall within the range of 16.9 and 38.1 mpg.
- 99.7% of the gas mileages will fall within the range of 11.6 and 43.4 mpg.