In my teaching, I always strive to encourage students to look at statistics from a practical point of view. In a recent class period covering the normal distribution, I indicated that data values more than 3 standard deviations away from the mean are rare. The odds for seeing such data points are about 3 in 1,000. After class, a student came up to me and said that she understood the lecture except the things I said about 3 out of 1,000. She did not know what to make of it.
Based on the empirical rule, which can be thought of as a short form of the normal distribution, says that 99.7% of the data are within 3 standard deviations away from the mean. That means that only 0.3% of the data are more than 3 standard deviations away from the mean.
If some event has only 0.3% chance of happening, the odds are 0.3 out of 100. Suppose that we are talking about people and the data are measurements of height (in inches). So only 0.3 people out of 100 have heights more than 3 standard deviation away from the mean. Since we cannot have 0.3 people, it is better to say 3 people out of 1,000 are either 3 or more standard deviations taller than the mean or 3 or more standard deviation shorter than the mean.
The ratio can be expanded further. We can say 30 people out of 10,000 are either 3 or more standard deviations taller than the mean or 3 or more standard deviations shorter than the mean. Add two more zeros, we have: 3,000 people out of 1,000,000 (one million) are either 3 or more standard deviations taller than the mean or 3 or more standard deviations shorter than the mean.
So out of one million people of the same gender and of similar age (say, young adult males aged 20 to 29), only about 3,000 people or so are either very tall or very short. I would say seeing such people is a rare event. Height measurements (and other biological measurements) from a group of people of the same gender (and of similar age) tend to follow a bell-shaped distribution.
To make it even easier to see, let’s say the heights of young adult males follow a normal distribution with mean = 69 inches and standard deviation = 3 inches. Approximately 3,000 out of one million young adult males are either 9 or more inches taller than 69 inches (over 6 feet 6 inches) or 9 or more inches shorter than 69 inches (less than 5 feet). Since the bell curve is symmetrical, about 1,500 out of one million are taller than 6 feet 6 inches.
To look at this visually, the following is a bell curve describing the heights of the young adult males. Note that the bell curve ranges from about 55 inches to 80 inches. But most of the area under the bell curve is from 60 to about 78 inches.
There are about 21.5 million young adult males in the U.S. (looked up the website of US Census Bureau). So the estimated number of young adult males taller than 6 feet 6 inches is 32,250 (=1,500 times 21.5). So we can say that all young adult males are shorter than 6 feet 6 inches (statistically speaking). If all U.S. young adult males taller than 6 feet 6 inches were to attend the same baseball game in the Dodger Stadium, there would still be about 23,000 empty seats!
In my experience, many students have no problem reciting the empirical rule (reciting the three sentences about 1, 2, and 3 standard deviations). Some of them have a hard time applying it, especially using it as a quick gauge of the significance of data. I think understanding it within a practical context should make it easier to do so. I essentiall gave the same explanation to my student, which she thought was helpful.