Many parents try to find the best high school for their children. Many variables are usually involved in choosing a school. One criterion is to look at successes of the graduates of a school. One measure of success may be the financial well being.
Let’s say a parent wants to use the mean wealth (income and investments) of the graduates of a high school as a measure of success. The mean here is what many people considered as average (obtained by adding up all data values and divided by the total number of data values).
Then Woodrow Wilson High School in Washington D.C. and The Lakeside School in Seattle, Washington will surely stand out. The mean wealth of the graduates of Woodrow Wilson High School is estimated to be more than $1.5 million. This average includes all the recent graduates who are still in college, the graduates who are still working, the ones who had retired, and the graduates from 1930s who are now dead. Impressive!
Well, the mean wealth of the graduates of The Lakeside School is even more impressive, estimated to be over $3 million! The figures of $1.5 million and $3 million are conservative estimates. The actual averages could very well be several times over (see note at the end of the post).
It might be a good idea to find an explanation rather than immediately making preparation of a move to either Seattle or DC. The high mean wealth mentioned here can be explained by three household names: Warren Buffet (graduate of Woodrow Wilson High School, a legendary investor) and Bill Gates and Paul Allen (graduates of The Lakeside School, both founders of Microsoft).
According to Forbes, Bill Gates is the second richest person in the world (net worth $53 billion), Warren Buffet is the third richest (net worth $47 billion). Paul Allen is the 37th richest (net worth $13.5 billion).
The statistical story here is that we need to choose numerical summaries carefully. The enormous wealth of Warren Buffet, Bill Gates and Paul Allen causes the mean to be unrepresentative of the wealth of most graduates (as indicated above, the mean is calculated by summing the data values and then divided by the number of data points). When there are outliers in the data (e.g. wealth of Bill Gates, Paul Allen and Warren Buffet) or when the data distribution is very skewed, the mean is not a good numerical summary to use if the goal is to give a picture of what a typical data value looks like. The median is a better representative of the center value of a skewed distribution.
Numerical summaries that are not affected by extreme data values are said to be resistant. The examples of Woodrow Wilson High School and The Lakeside School show that the mean is not a resistant measure. The extreme wealth of Bill Gates, Paul Allen and Warren Buffet severely skewed the mean wealth. However, the median is less likely to be affected by extreme data values. Thus if one wishes to use wealth as a measure of success in graduates, the median is more realistic and representative of the wealth of graduates.
Choosing a high school or university by studying the successes of its graduates. Picking a career field. Finding an apartment in a new city. These are just a few examples where one needs to know representative values in a set of decision variables. How do we choose a numerical summary that gives meaningful information? In the case of mean and median, which one to use? The answer lies in the decision variables. If there are outliers or if the data distribution is skewed, a resistant measure such as the median will be more representative. If the data distribution is reasonably symmetric without outliers, use mean.
For more information on resistant measures, see When Bill Gates Walks into a Bar.
For the figures of $1.5 million and $3 million, we assume that the total wealth of all graduates of Woodrow Wilson High School is $47 billion (just the wealth of Warren Buffet) and the total wealth of all graduates of The Lakeside School is $66.5 billion (53 + 13.5, the combined wealth of Bill Gates and Paul Allen). This is the same as assuming all the other graduates have zero wealth. Thus the estimated wealth numbers we have are likely a lower bound of the actual averages. We also assume that in the 97-year history of The Lakeside School, the school produced 200 graduates each year, and in the 76-year history of Woodrow Wilson High School, the school produced 400 graduates each year. Thus we have the following back-of-the-envelop calculations: