## A normal bell curve made with humans

A group of students are stacked according to their heights. The result resembles a normal bell curve.

Figure 1 – Normal Bell Curve of Heights of Students (found here)

Who are the students in the picture? The image is from the statistics and actuarial science department of Simon Fraser University. These students are likely from this university (either from the statistics department or from other areas). Beyond the assumption of being students in a Canadian university, we do not know much else about the students.

There are 46 students in the picture. All students with the same height stand in the same column with the first student in a column holding a sign stating the height. The signs are in ascending order from left to right. This formation of the students according to heights resembles a normal bell curve. Most of the students cluster in the middle with very few students at either the left tail (short students) or the right tail (tall students). This is a clever demonstration of the normal bell curve. It is relatively easy to create – basically ranking the students according to heights. It is an excellent opportunity to discuss the properties of normal bell curve.

Interestingly male and female students are mixed together. It appears that female students are mostly standing to the left of the center and male students on the right. The discussion would be even more meaningful if the students are grouped by gender (in two separate photos). In any case, we work with the photo as is. The following is the same stacking of the students with the students represented by dots.

Figure 2 – stacking of dots to represent the same bell curve

Figure 2 (a dot plot) actually gives a lot more clarity on the shape of the curve. The curve is not perfectly symmetric, though the overall shape resembles a bell curve. The curve is not “smooth” as there is depression in some places (e.g. at 5′ 3″, 5′ 5″ and 5′ 10″). Perhaps the “rough” curve is due to the mixing of two populations (males and females) or perhaps due to the small sample size. The curve appears sufficiently close to a normal bell curve to warrant a closer look.

The dot plot in Figure 2 actually gives the sample data. Let’s perform calculations on the sample data. Particularly we focus on calculations that can confirm whether the curve is a normal bell curve (at least whether we should reject the notion that it is a normal bell curve). If there is reason to believe that the heights of university students are normally distributed (or approximately normally distributed), the normal model can be useful for making estimations. We give a few examples to demonstrate how this is done.

First, calculate the mean and the median of the height measurements. The mean is 66.78 inches and the median is 67 inches. The mean is simply the sum of all the measurements divided by the total number of data points. The median is the measurement that is in the middle if the measurements are ranked in ascending order, which is the case in Figure 2. The mean and median are very close – indication that the height measurements form a symmetric distribution.

Another check is to see whether the height measurements follow the empirical rule (also known as the 68-95-99.7 rule). Here’s the rule.

• About 68% of the data will fall within plus or minus one standard deviation of the mean.
• About 95% of the data will fall within plus or minus two standard deviations of the mean.
• About 99.7% of the data will fall within plus or minus three standard deviations of the mean.

To verify these three points, we need to calculation the standard deviation of the height sample data. The sample standard deviation of height measurements is 3.37 inches. Here’s the three intervals of interest.

• (66.78 – 1 x 3.37, 66.78 + 1 x 3.37)=(63.41, 70.15)
• (66.78 – 2 x 3.37, 66.78 + 2 x 3.37)=(60.04, 73.52)
• (66.78 – 3 x 3.37, 66.78 + 3 x 3.37)=(56.67, 76.89)

The question is, how many of the 46 height measurements fall into each interval? For example, how many of the measurements are between 63.41 and 70.15 inches? Is the percentage close to 68? Here’s the counts.

• 30 measurements in (63.41, 70.15). Percentage = 30/46 = 65.2%
• 45 measurements in (60.04, 73.52). Percentage = 45/46 = 97.8%
• 46 measurements in (56.67, 76.89). Percentage = 46/46 = 100%

These percentages are off from the 68-95-99.7 rule, but are close enough. Of course, the percentages from the data do not have to match the 68-95-99.7 rule exactly in order to conclude that the distribution is normal. Because the actual percentages of 65-.2-97.8-100 are close enough, there is no reason to believe that the height measurements of university students is not a normal bell curve.

Once we know that the height measurements of people in a certain population (university students in this case) shape like a normal bell curve, we can use the bell curve to estimate proportion of the population that fall into a given interval. This is one advantage of working with a normal distribution.

Based on the above discussion, we conclude that the height measurements of university students follow an approximate normal distribution with mean 66.78 inches and standard deviation 3.37 inches. With this information, we can make estimation.

For example, what is the proportion (or percentage) of university students whose heights are less than or equal to 70 inches?

The key to answering the question is to convert the height measurement of 70 inches into a standardized score (or z-score), which in this case is 0.96. The z-score is obtained by this calculation: z=(70 – 66.78)/3.37 = 0.96 (rounded to two decimal points). Then we can look up the z-score 0.96 in a normal table to obtain the probability 0.8315. We conclude that about 83.15% of university students are 70 inches or shorter. To look up 0.8315, use a normal table that is similar to the one found here. Looking up normal probability using a calculator will yield slightly different answer.

Another example. What is the probability that a randomly selected university student is between 64 inches (5 feet 4 inches) and 70 inches (5 feet 10 inches) tall?

First, turn the measurements of 64 inches and 70 inches into standardized scores (z-scores). The z-score of 70 inches is 0.96 (as calculated above). The z-score of 64 inches is (64 – 66.78)/3.37 = -0.82 (rounded to two decimal places). The probability of the z-score of -0.82 is 1 – 0.7939 = 0.2061. Note that the table in the link has no negative z-scores. So look up the z-score of 0.82, which gives 0.7939. Then subtract 0.7939 into 1 gives 0.2061. Then the answer to the question is 0.8315 – 0.2061 = 0.6254. This tells us that about 62.54% of the time, a randomly selected university student is between 64 and 70 inches tall.

One more example, what is the proportion of university students taller than 6 feet (72 inches)?

The z-score of 72 inches is (72 – 66.78)/3.37 = 1.55. Looking up the table, the probability is 0.9394. This would be the probability of less than 70 inches. So the answer is 1 – 0.9394 = 0.0606. About 6% of the university students is 6 feet tall or taller. If the size of the student body is 10,000, then there are about 600 students who are 6 feet or taller.

Here’s a few previous posts on normal distribution.

To calculate the mean and median as well as standard deviation of the height measurements, we should use a statistical calculator (or software) if possible. If the goal is to compute using basic principle, the mean is of course the sum of the data points divided by the total number of data points. In this example, sum all the 46 height measurements and then divide the sum by 46.

To find the median height measurement, rank the measurements from smallest to the largest. The measurements shown in Figure 2 are already ranked. Locate the 23rd measurement and the 24th measurement. The median would be the average of these two measurements. Since they are the same (67 inches), the median height measurement is 67 inches.

To find the standard deviation of the height measurements, use a calculator with statistical functions. To calculate without using a statistical calculator, first calculate the sample variance. Then take square root of the sample variance. The following is how the variance is calculated (this is best done in a spreadsheet or other software).

\displaystyle \begin{aligned} s^2&=\frac{1}{46-1} \biggl((60-66.78)^2+(61-66.78)^2 \\&\ \ + (62-66.78)^2+\cdots+(72-66.78)^2+(73-66.78)^2+(73-66.78)^2 \biggr) \\&=\frac{1}{45} \times 511.8264 \\&=11.37392 \end{aligned}

The idea is to take the difference between each measurement and the mean 66.78 and then square the difference. Do this for each of the 46 measurements. Sum these 46 squared difference. Then divide the sum by 45 (one less than 46). This adjustment of dividing by one less than the sample size is to sample variance an unbiased estimator of the population variance.

The sample standard deviation is the square root of 11.3792. Thus the standard deviation of the height measurements is $\sqrt{11.3792}=3.372524277$ inches.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 Dan Ma

This entry was posted in Exploratory Data Analysis (EDA), Normal Distributions and tagged , , , , , . Bookmark the permalink.