Mitt Romney is currently a candidate for the 2012 Republican Party nomination for U.S. President. He recently, bowed to pressure from another presidential candidate in the Republican Party, had to release his past tax returns. The release of these tax returns opened up a window on the personal finance of a very rich presidential candidate. Immediately upon the release on January 24, much of the discussion in the media was centered around the fact that Romney paid an effective tax rate of about 15%, which is much less than the rates paid by many ordinary Americans. Our discussion here is neither about tax rates nor politics. For the author of this blog, Romney’s tax return provides a rich opportunity to talk about statistics. Mitt Romney is an excellent example opening up a discussion on income distribution and several related statistical concepts.

**Mitt Romney**

Mitt Romney’s tax return in 2010 consisted of over 200 pages. The 2010 tax return can be found here (the PDF file can be found here). The following is a screen grab from the first page.

**Figure 1**

Note that the total adjusted gross income was over $21 million. Just the taxable interest income alone was $3.2 million. Most of Romney’s income was from capital gain (about $12.5 million). It is clear that Romney is an ultra rich individual. How wealthy? For example, where is Romney placed in the income scale relative to other Americans? To get a perspective, let’s look at some income data from the US Census Bureau. The following is a histogram constructed using a frequency table from the Current Population Survey. The source data are from this table.

**Figure 2**

The horizontal axis in Figure 2 is divided into intervals made up of increments of $10,000 all the way from $0 to $250,000 plus. According to the Current Population Survey, about 7.8% of all American households had income under $10,000 in 2010 (almost 1 out of 13 households). About 12.1% of all households had income in between $10,000 to $20,000. (about 1 in 8 households). Only 2.1% of the households had income over $250,000 in 2010. Obviously Romney belongs to this category. The graphic in Figure 2 shows that Romney is in the top 2% of all American households.

Of course, being in the top 2% is not the entire story. There is a long way from $250,000 (a quarter of a million) to $21 million! Clearly Romney is in the top range of the top class indicated in Figure 2. Romney is actually in the top 1% of the income distribution. According to Wall Street Journal, Romney is well above the top 1% category. According to this online reporting from Wall Street Journal, Romney is in the top 0.0025%! According to one calculation (mentioned in the Wall Street Journal piece), there are at least 4,000 families in the category of being in the top 0.0025%. Could it be that the families in this category number in the thousands?

The figure below shows that the sum of the percentages from the first 5 bars in the histogram equals 50.5%. This confirms the well known figure that the median household income in the United States is around $50,000.

**Figure 3**

The histograms in both Figure 1 and Figure 2 are clear visual demonstrations that income distribution are skewed (e.g. most of the households make modest income). Most of the households are located in the lower income range. Just the first 5 intervals alone contain 50% of the households. The sum of the percentages of the first 10 vertical bars ($0 to $99,999) is about 80%. So making 6-figure income lands you in the top 20% of the households. Both histograms are classic examples of a skewed right distribution. The vertical bars on the left are tall and the bars taper off gradually at first but later drop rather precipitously.

The last two vertical bars (in green) are aggregations of all the vertical bars in the $250,000+ range had we continued to draw the histograms using $10,000 increments. Another clear visual sign that this is a skewed distribution is that the left tail (the length of the horizontal axis to the left of the median) differs greatly from the right tail (the length of the horizontal axis to the right of the median). When the right tail is much longer than the left tail, it is called a skewed right distribution (see the figure below).

**Figure 4**

On the other hand, when the left tail is longer, it will be called a skewed left distribution.

Another indication that the income distribution is skewed is that the mean income and the median income are far apart. According to the source data, the mean household income in 2010 was $67,530, much higher than the median of about $50,000 (see the figure below).

**Figure 5**

Whenever the mean is much higher than the median, it is usually the case that it is a skewed right distribution (as in Figure 5). On the other hand, when the opposite is true (the median is much higher than the mean), most of the time it is a skewed left distribution.

A related statistical concept is the so-called resistant measures. The median is a resistant measure because it is not skewed significantly by extreme data values (in this case extremely high income and wealth). On the other hand, the mean is not resistant. As a result, in a skewed distribution, the median is a better indication of an average. This is why income is usually reported using median (e.g. in newspaper articles).

For a more detailed read of resistant measures, see When Bill Gates Walks into a Bar and Choosing a High School.