top of page

The Polls Were Fine in 2016, and You Should Trust the Predictions in 2020

A Stats Nerd Explainer Written to a Friend

 

SECTION ONE

The Polls Were Not Wrong in 2016


I have been heavily disputing the idea that the polls and the models were wrong in 2016. I think people just did not pay enough attention to what the polls were actually saying, and instead took the headline numbers as solid gold. If the headline numbers are what people were relying on, then yes, the polls were dead wrong. But, relying on a headline number is not something any statistician would do; media organizations do it in order to simplify stats for people without stats training.


Most of the polls that predicted Clinton would win were popular vote polls. They ended up being pretty dang close to the actual popular vote totals. All of the Electoral College polls showed Clinton with a much smaller chance to win; my model gave her a 61% chance.


For some people 61% sounds like a lot, but it isn't. Here is an experiment. Go take 5 ping pong balls, write 'Clinton' on three and 'Trump' on two. Put them in a brown paper bag, and pull one out. The odds for this experiment are 60/40 in favor of a Clinton ping pong ball. But, if someone pulls out a Trump ball, they wouldn't likely think "this brown paper bag must be broken." For some reason, people thought that about the polls.


Even now, with my model giving Biden an 83% chance, it would be roughly five Biden balls and one Trump ball. Much more likely to pull out a Biden ball, but nobody should think the paper bag is broken if that one Trump ball does get picked out. We should be thinking the same thing about the polls.


So, I have not changed anything from my 2016 model, and just yesterday my department at the University of Kentucky hosted Charlie Cook--Mr. Cook Political Report himself--and he would not change anything either. I think FiveThrityEight has changed some stuff, but not a lot, and I know Real Clear Politics has not really changed either.


So, what happened in 2016? It really comes down to understanding how polls work. The polling numbers we see are just point estimates. However, statisticians can never be 100% certain of anything. Usually statisticians use a 95% level of confidence to measure things. This does not produce a point estimate, it produces a confidence interval.


***Side note for any stats nerd reading. What I am saying is not true for Bayesian intervals, but I am not aware of any polling organization that uses Bayesian estimation, so everything I say applies to frequentist estimation.***


Polling organizations give you an estimate, and then a 'Margin of Error'. In statistics, a margin of error doesn't exist. A margin of error is just the distance from the mid point of the confidence interval to the upper and lower bounds of the interval. However, most people are not aware that there is actually nothing in statistics that says the midpoint of the confidence interval is the best guess.


What a confidence interval says is, more or less, "we can be 95% sure the true value--in this case the share of votes--will be somewhere in this confidence interval". That's a bit of a simplification, but it's fine for polling. The midpoint is reported because, on average, it will be the shortest distance from the true value.


Again, a quick example: if there are 100 polls, all showing a confidence interval from 42-48 then the real value can be any of those numbers. So, why report 45? Because the true outcome could be 47, 44, 42, or 46. In 100 polls it will likely be all of those values several times. On average 45 will be the closest to the true values. So, if in one poll the true value is 42, then yes, 44 is a closer guess. But if in another poll 47 is the true value, 45 is closer than 44. You can see with 100 polls that the average closest guess is going to be 45.


But, this does not mean that 45 is the "best" guess. All numbers within a confidence interval are all equally likely outcomes.


So, anytime confidence intervals overlap, a statistician cannot be 95% certain of anything. So, if the Clinton confidence interval is 44-49, and the Trump confidence interval is 40-45, the overlap from those confidence intervals means the poll cannot say with 95% certainty that Clinton will win. If you decrease your certainly level to 90% the confidence intervals might not overlap, but then you double the likelihood, from 5% to 10%, that the poll is just plain wrong.


In most states in 2016 the confidence intervals with Clinton and Trump overlapped. Because the overall confidence intervals of Clinton were higher, the likelihood that she would win was greater, but certainly nothing close to rock solid because the intervals overlapped.


If I am remembering correctly, there was good polling close to the election that showed the confidence intervals overlapping in every state except Wisconsin. That would mean that Clinton could lose in PA, MI, FL, and any other close state, and the polls were not actually wrong. This is why my model still gave Trump a 39% chance to win.


The polls were really only wrong in WI. But, remember, they are using a 95% confidence level, so we have to be willing to accept that they will just plain be wrong 5% of the time.


The models all give Biden a much larger chance to win because the confidence intervals don't overlap. In some polls they do, but for the most part, Biden's lead is large enough that there is not overlap, which means that a statistician can say with 95% confidence that Biden's vote share will be larger than Trump's. This is not something that could be said in 2016.


 

SECTION TWO

Why the Models Are More Confident Now Than in 2016


There are lots of 'real world' reasons to explain why Biden's lead is so much larger and more solid than Clinton's was, which is why the models are more confident this time around.


First, far fewer voters have told pollsters they are undecided in 2020 than in 2016. In most of the swing states in 2016 the percentage of undecided voters was larger than the percentage lead Clinton held. For example, when she was up by 6 points, there were often 7-9% of voters undecided. That is not the case this year.


Clinton clearly did not do enough to persuade undecided voters. This is less of an issue for Biden because so many have already made up their minds. On top of that, we are seeing trends that show Biden is winning undecided voters as they make up their minds. In 2016 undecideds broke about 65-35 for Trump. Currently they seem to be breaking about 70-30 for Biden.


On top of that, Biden’s lead is over 50% nationally, and over 50% in a lot of swing states Trump won. Clinton was never over 50%, and her confidence interval often did not even include 50%; rather, it was below it entirely. Some of Biden’s confidence intervals are entirely above 50%.


The second reason is that the polling in this race has been stable...like historical anomaly stable. I’ve never seen anything like it in polling going back 60ish years (when polls really started to be done reliably). No matter what happens in the real world, Biden’s polling lead has stayed incredibly stable at +7-9%. The largest deviations from that are when Biden jumped up to +11% for a few weeks in early summer, and when Trump jumped up to -6% in late summer. But, both of these were very temporary, and could have reflected variability in polling rather than actual voter shifts (this is really irrelevant for the race, either reason is fine). Either way, the race quickly reverted to Biden +7-9%. Nothing seems to be able to move voters either way. That consistency decreases the likelihood of a surprise shift.


This lead may actually be growing. Since the debate, Biden’s polling lead has averaged roughly 13%. The early polls were small sample sizes, so I didn't put much stock in them, but some from the last week or so are solid polls, and they still show a huge lead. These polls have pushed Biden's polling average lead from 7-9% up to 10-11%. It may regress back to the +7-9% average, but we can say for sure the average is not shrinking. If anything it’s growing.


So, what is happening to cause this? Several things. The first is just that people still discount just how much Clinton was disliked and distrusted. People really didn't believe anything she said. This was also true for Trump, but this particularly hurt Clinton in the rust belt, where they viewed NAFTA as having destroyed their livelihoods. Her flip flops on issues like the Trans-Pacific Partnership trade deal as the “gold standard” of trade deals just reminded voters of this. Biden does not have these issues. His likability and trustability numbers are WAY higher.


***Side note: A lot of people argue this animosity toward Clinton was because of sexism or misogyny. I am not qualified to make that judgement. It may be true, but these types of cultural preferences are not an area where I would be considered an expert. So, unless I could see data showing women who run for president are always less liked and trusted, I could not confirm or dispute the sexism factor (Clinton has been the only major party female nominee).***


So, Biden does not suffer from the same a priori impediments that Clinton did, and Trump’s margin for error (not statistically, but reality-y) is even smaller. I’ll give you an example. In 2016 Trump won Wisconsin while getting fewer votes than Romney got in 2012. Romey lost by 7 percentage points. If the explanation for 300,000 fewer voters coming out to vote for a Democrat is simply that they didn’t like or trust Clinton, then Trump would have to increase his base to win. Every datapoint I have seen suggests his base has actually shrunk.


His base has shrunk because he has not delivered on his promises. Mexico didn’t pay a dime for the wall. USMCA is just NAFTA with a minimum wage in Mexico. Manufacturing and coal mining are still really hurting. Tariffs have cost the average family close to $2,000. And, the trade deficit has grown (at least until the pandemic, I'd have to check to see what's happened since then).


All this is even before anything to do with Coronavirus is taken into account. The numbers on Trump's handling of the virus are shockingly horrible. Even voters over 65 are starting to turn away from him because they feel like he does not care if they die. I never thought I'd see such a huge shift among older voters. It's unheard of for seniors to have such a massive deviation from trend, at least as far as I'm aware. Trump won seniors by about 10 points in 2016. He might lose them by 25 points this year.


It also seems like his cruelty toward anyone who disagrees with him has simply worn people out. I know his base loves--LOVES--owning the libs. But the data suggests most people are tired of it. Add onto that the mobilized and energized minority bloc, that no matter how much Trump wants to say otherwise is still overwhelmingly against him, and his chances get really slim. Add on to this the polling that shows even military members (i've seen active duty and veteran polls) are fed up with him, and don't trust him. He might only get mid 40ish% of military votes, which would be really low for a Republican. All the McCain bashing, gold star bashing, Russia bounties, and suckers and losers, whether true or not, seems to have caused him to lose 20ish% of his support among the armed services community.


So, to sum it all up, his winning recipe in 2016 was that he only had to perform above his confidence interval in WI, where so many voters were undecided. To win in 2020 he would have to perform above his confidence interval in about 6 states and then hope Biden does not perform well in NC, TX, GA, IA, or OH.


To give you an idea of just how tough Trump's map is. He could hang on to FL, PA, NC, TX, GA, OH, and IA, and if he loses MI, WI, and AZ the race would be a tie. Then if Lincoln, Nebraska votes for Biden (Nebraska and Maine give electoral college votes by congressional district, not winner takes all for the state), Biden still wins.


Right now, Trump looks like he will lose PA, NC, and FL, and the rest of those are nailbiter close for Trump (**there isn't a ton of polling in these other states because it was assumed Trump would win, so they could be leaning more toward Trump, and it's hidden by not having enough polling data**), which was entirely unexpected.


In a less confusing explanation, my model shows that Biden is equally as likely to get over 400 electoral votes (1/6ish) as Trump is to win at all (1/6ish). The most likely outcome in my model is Biden getting 350-360 electoral votes.

Recent Posts

See All

Georgia runoffs wrap up

Democrats really should not be celebrating right now. I once had a coach tell me that the greatest athletes in their sports hate to lose more than they love to win. Guys like Kobe Bryant, Tom Brady, a

bottom of page