We use an example of presidential elections to illustrate the reasoning process of statistical inference.

The current holder of a political office is called an incumbent. Does being an incumbent give an edge for reelection? The answer seems to be yes. For example, in the United States House of Representatives, the percentage of incumbents winning reelection has been routinely over 80% for over 50 years (sometimes over 90%). Since 1936, there were 13 US presidential elections involving an incumbent. In these 13 presidential elections, incumbents won 10 times (see the table below).

Clearly, more than half of these presidential elections were won by incumbents. Note that the three wins for the challenger occurred in times of economic turmoils or in the wake of political scandal. So presidential incumbents seem to have an edge, except possibly in times of economic or political turmoils.

The power of incumbency in political elections at the presidential and congressional level seems like an overwhelming force (reelection rate of over 80% in the US House and 10 wins for the last 13 presidential incumbents). Incumbents of political elections at other levels likely have similar advantage. So the emphasis we have here is not to establish the fact that incumbents have advantage. Rather, our goal is to illustrate the reasoning process in statistical inference using the example of presidential elections.

The statistical question we want to ask is: does this observed result really provide evidence that there is a real advantage for incumbents in US presidential election? Is the high number of incumbent wins due to a real incumbent advantage or just due to chance?

One good way to think about this question is through a “what if”. What if there is no incumbency advantage? Then it is just as likely for a challenger to win as it is for an incumbent. Under this “what if”, each election is like a coin toss using a fair coin. This “what if” is called a null hyposthesis.

Assuming each election is a coin toss, incumbents should win about half the elections, which would be 6.5 (in practice, it would be 6 or 7). Does the observed diffference 10 and 6.5 indicate a real difference or is it just due to random chance (the incumbents were just being lucky)?

Assuming each election is a coin toss, how likely is it to have 10 or more wins for incumbents in 13 presidential elections involving incumbents? How likely is it to have 10 or more heads in tossing a fair coin 13 times?

Based on our intuitive understanding of coin tossing, getting 10 or more heads out of 13 tosses of a fair coin does not seem likely (getting 6 or 7 heads would be a more likely outcome). So we should reject this “what if” notion that every presidential election is like a coin toss, rather than believing that the incumbents were just very lucky in the last 13 presidential elections involving incumbents.

If you think that you do not have a good handle on the probability of getting 10 or more heads in 13 tosses of a fair coin, we can try simulation. We can toss a fair coin 13 times and count the number of heads. We repeated this process 10,000 times (done in an Excel spreadsheet). The following figure is a summary of the results.

Note that in the 10,000 simulated repetitions (each consisting of 13 coin tosses), only 465 repetitions have 10 or more heads (350 + 100 + 15 = 465). So getting 10 or more heads in 13 coin tosses is unlikely. In our simulation, it only happened 465 times out of 10,000. If we perform another 10,000 repetitions of coin tossing, we will get a similar small likelihood for getting 10 or more heads.

Our simulated probability of obtaining 10 or more heads in 13 coin tosses is 0.465. This probability can also be computed exactly using the binomial distribution, which turns out to be 0.0461 (not far from the simulated result).

Under the null hypothesis that incumbents have no advantage over challengers, we can use a simple model of tosses of a fair coin. Then we look at the observed results of 10 heads in 13 coin tosses (10 wins for incumbents in the last 10 presidential elections involving incumbents). We ask: how like is this observed result if elections are like coin tosses? Based on a simulation, we see that the observed result is not likely (it happens 465 times out of 10,000). An exact computation gives the probability to be 0.0461. So we reject the null hypothesis rather than believing that incumbents have no advantage over challengers.

The reasoning process for a test of significance problem can be summarized in the following steps.

- You have some observed data (e.g. data from experiments or observational studies). The observed data appear to contradict a conventional wisdom (or a neutral position). We want to know whether the difference between the observed data and the conventional wisdom is a real difference or just due to chance. In this example, the observed data are the 10 incumbent wins in the last 13 presidential elections involving incumbents. The neutral position is that incumbents have no advantage over the challengers. We want to know whether the high number of incumbent wins is due to a real incumbent advantage or just due to chance.
- The starting point of the reasoning process is a “what if”. What if the observed difference is due to chance (i.e. there is really no difference between the observed data and the neutral position)? So we assume the neutral position is valid. We call this the null hypothesis.
- We then evaluate the observed data, asking: what would be the likelihood of this happening if the null hypothesis were true? This probability is called a p-value. In this example, the observed data are: 10 incumbent wins in the last 13 presidential elections involving incumbents. The p-value is the probability of seeing 10 more incumbents win if indeed incumbents have no real advantage.
- We estimate the p-value. If the p-value is small, we reject the neutral position (null hypothesis), rather than believing that the observed difference is due to random chance. In this example, we estimate the p-value by a simulation exercise (but can also be computed by a direct calculation). Because we see that the p-value is so small, we reject the notion that there is no incumbent advantage rather than believing that the high number of incumbent wins is just due to incumbents being lucky.

Any statistical inference problem that involves testing a hypothesis would work the same way as the presidential election example described here. The details that are used for calculating the p-value may change, but the reasoning process will remain the same. We realize that the reasoning process for the presidential election example may still not come naturally for some students. One reason may be that our intuition may not be as reliable in working with p-value in some of these statistical inference problems, which may involve normal distributions, binomial distributions and other probability models. So it is critical that students in an introductory statistics class get a good grounding in working with these probability models.

See this previous post for a more intuitive example of statistical inference.