Given polling’s failures with Trump, Brexit, and the FARC vote, what can we do to improve its technology?
“Sooner or later,” Politico opined in 2015, “the models are going to miss in an American presidential election and data journalism as a whole is going to suffer.” This prescient advice was, it appears, largely discounted ahead of the 2016 presidential election where the Republican candidate, Donald Trump, narrowly defeated Democrat Hillary Clinton and is now the president-elect heading into 2017.
It’s trite to say it, but the only poll that really matters in the US is Election Day itself.
Most polls wrongly anticipated a Clinton win, and with a large Electoral College sweep, too. Not all polls predicted that sweep, of course, nor did a number of academic models that, with varying methodologies, come closer to predicting the actual outcome, at least in terms of raw numbers. Quite a few polls cautiously predicted a close contest, which is what we got, but again with Clinton winning in the end. A handful predicted a Trump win, and therefore were correct … except that they predicted he’d win the popular vote, which he lost. Based on the Cook Political Report’s most recent count, Clinton actually did win the popular vote, by 1.4 million ballots cast at the time of this writing. (This does not matter for the final result, though, since Trump won the Electoral College.)
The traditional way of conducting polls, by landline phone interviews, is in decline. Adapting the process to the digital age has proven expensive and difficult: Simply put, polling accuracy suffers as fewer people participate and the randomized sample ends up not being so random anymore. But can online polls, which in the case of Brexit were closer to the result than a lot of the traditional polling methods and odds takers, rectify this?
The voters who hang up the phone
Undoubtedly, polling is an exercise in futility with a lot of voters. As political scientist Allan Lichtman, whose own electoral correctly anticipated a Trump win without recourse to polling data, told CBS, polls are snapshots in time and screen for likely voters. Quanta Magazine suggests that there was some systemic error across most polling models, since even when aggregated they predicted an outcome that didn’t happen, and it will take months if not years to figure out where the common denominator really was.
People may have lied about supporting Trump before and after the vote. Before the election, a McClatchy investigation found that while there was not a silent majority of “undercover voters,” per the Trump campaign’s portrait, there was concern that such voters did exist and were being overlooked. After it, a BuzzFeed report on women voters noted that many may have just pressed the button and went on their way without ever discussing their choice.
These voters, then, “internalized the fact that voting for Trump is not something you do in public,” so they are not on any radars going in or out of the booth.
That said, all the talk of Republicans staying home rather than voting for Trump proved wrong because most who did vote, voted along party lines. The Republican National Committee (RNC) was actually on point for its own internal models on turnout for Trump. He was projected to get fewer or almost the same number of votes in total than his predecessors, John McCain (2008) and Mitt Romney (2012).
Clinton underperformed compared to President Obama in his 2008 and 2012 campaigns. The “enthusiasm gap” social media analysts found favoring Trump actually turned out to be impactful, and only two polls, run by USC Dornsife-Los Angeles Times, accounted for that, and it consistently had him winning as a result.
Trump did better than McCain and Romney in total, and much better in certain states that he did need to win. Clinton, though she won the popular vote, received several million fewer votes overall than Barack Obama did against his opponents. Because the Democratic Party didn’t get their projected vote out, Trump’s slightly higher-than-expected numbers were enough to sweep the Electoral College for him in Florida, Iowa, North Carolina, Ohio, Pennsylvania, Wisconsin, and (maybe) Michigan.
One of the major driving forces of data success — and a place where Clinton’s op fell down — was in turnout projections 8/
— Phil Mattingly (@Phil_Mattingly) November 9, 2016
Clinton, in contrast, was only able to overcome this problem in two of her must-win states: Virginia and Nevada. According to The Washington Post, “Had turnout been higher, those polls [with Clinton winning] would likely have been spot on, since the actual electorate would have better matched the electorate pollsters expected to see.” Although Clinton’s vote count will rise as more ballots are counted in states that were already called, these do not matter because they were not states she was ever in danger of losing (California) or had a real chance of taking (Utah).
Why this was the case is partly a technology question and partly a social psychology question, because one influences the outcome of the other.
Voters may even distort how they voted, or if they voted or intend to vote at all, to “make up” for their actual choices. That doesn’t change the fact, of course, but it makes them feel like they were more conflicted or just a step away from making the “right” choice.
As expected, a lot of talk around polling to follow tonight. Two part problem: who was polled and who didn’t tell the truth
— Matt Oczkowski (@MattOczkowski) November 9, 2016
The voters who don’t answer the phone
This was also the case recently in Colombia, where outside observers, pollsters, media outlets, and the government all thought a referendum on a peace and reconciliation deal with the FARC would pass by a fair margin. Polling ahead of the vote showed that 88% of respondents had already settled on their choice and 62% of respondents were voting “Yes.”
The final result? 50.2% against to 49.8% that were for the deal.
With only 10% undecided per the final polls on the eve of the vote, where did this discrepancy come from? There is no one answer. Reflecting on their predictions, pollsters believe that many voters may have lied or just opted not to participate, skewing the sample. Many people who opposed the vote did so because they felt the deal was more than the FARC deserved, but did not want to say as much when the “Yes” vote was the dominant narrative and to oppose it ran the risk of reigniting the decades-long conflict.
A slim plurality was upset enough with the deal to hold onto their “No” vote, but worried enough about the national mood to mumble something non-committal about backing the “Yes” camp to avoid a debate. Only a vocal minority actually felt comfortable to say “No” loud and proud. But just hearing them was enough for voters to think that, well, at least someone was saying what they thought all along, so maybe, just maybe they were right to be skeptical.
It was, in a sense, permission to dissent, without most people having to risk doing so publicly since someone else was saying as much. How, then, can polls measure these quiet voters, the “shy Tories” in Brexit, the silent Trump supporters? They can’t, not without mind reading.
So, in keeping quiet or lying, people reserve the right to vote no, but in a way that doesn’t show up in the polls. Additionally, finding people was, in Colombia, complicated by the country’s history of government spying, demographics, and the ICT infrastructure people were reached on to give their views. The latter point is true in every country with a modern wireless network these days. Cellphone surveys are both more expensive and harder to coordinate than landline calls, which were how pollsters traditionally got ahold of their samples. According to Nature, the response rate pollsters get from US mobile users is less than 10%, because people just ignore the calls.
Adjusting to this new reality was, is, and will be a perennial challenge in the forecasting fields.
Online polling not quite there yet. But it could
Brexit was also another big, recent upset for most traditional polls predicting the “Remain” in the EU camp would beat the “Leave” side. People misrepresented their views frequently due to the tenor of both sides’ arguments discouraging public discussion, and methodology issues again came into play with respect to phone versus online surveys.
Given how quick people’s opinions are to change, and how partisan people can be online, these can also be inaccurate predictors. But, online surveys were closer to the actual outcome in the UK, such as the experimental PollFish smartphone survey platform that “work like an ad network, but instead of ads, they display surveys inside apps users have already downloaded,” and hand out in-app points or gifts to users.
Online polling does not mean a single-question side bar on a news site asking who should be the next president, of course. That is not a real poll, or even a true survey given how it can be hijacked by online trolls. Actual online polling has standards and methods to estimate biases and select demographic data. Could this be the future, then? Maybe, but not for 2020’s US presidential election or any other big contests coming up in the near future.
A lot more time and effort are needed to see how and where online polls can overcome current problems without causing new ones.
The cons are numerous, but surmountable, for online polling. Like exit polling already does, they underrepresent minorities, something that has yet to be fixed. Also, though people are conditioned to take surveys, they tend to expect some kind of tangible reward at the end. Some pollsters won’t offer those because that would add a whole other level of sampling bias into the equation. Most online polls have huge margins of error because they are usually based on only a few demographic data points like gender, age, and race. And, depending on where the polling is done, lack of internet would shut out a lot of people. (One solution to this could be doing text message polling via mobile, and perhaps not counting the data used against a plan.)
For the pros side of the online polling argument, given how much detail people put online and are willing to give out on social media, pollsters’ access to that information (when accurate) would let them model surveys in far greater detail than anything available before. The US online polls with the lower margins of error, actually, select for political attitudes. Also, people are used to taking online surveys and don’t regard them as onerous or time-consuming as a phone one, which could improve participation for a poll. They also cost less, as much as 50% cheaper than a phone survey, and because the “pollster” is a bot and not a human, people may be more willing to state how they really feel. They also won’t feel as pressed for time given that they can finish the survey at their own pace.