Polling People Is Problematic
By Adam Grossman
“Pollsters are turrible.” Most people today are sharing Charles Barkley’s stance on analytics given the results of the Presidential elections. More specifically, people are asking how experts in predicative analytics and mathematical modeling could have been so wrong about the outcome of the election. This seems to be a big problem with big data.
The first counter to this conventional wisdom is that Hillary Clinton will actually receive around 1.5 million more votes than Donald Trump. Prior to the election, the national polls had Clinton leading Trump by about 3%. In actuality, Clinton finished with about a 1.5% advantage over Trump in the popular vote. While the polls were not completely accurate, they did “accurately” predict the person that would receive the most votes. The counter is that person who wins the majority of votes in the Electoral College actually wins the Presidential election. Trump clearly received of majority of votes there.
However, the “blame” is primarily being placed on the wrong people. It is not necessarily data scientists or data journalists whom are to blame. How could people like Nate Silver correctly predict the outcome of every single state in previous elections if their methods were completely wrong?
The problem comes not from data analysis but from data collection. More specifically, polling firms likely encountered tow problems that are very common with surveys. First, it can be very difficult to find a truly representative sample of a population. Second, people do not always act in the ways that say they are going to on a survey.
The first problem for political polling is that modern technology actually makes it more difficult to get a truly representative sample of voters. Pollsters have a very hard time contacting people using mobile phones. This means that millennials and younger voters are often more difficult to target. Also, it is more expensive to conduct polls in multiple languages which could mean that people where English is not their first language could be under-sampled by polling firms. Online polls are used but often have difficulty preventing people voting multiple times or creating fake accounts to vote for their preferred candidate.
This explanation, however, could seem to suggest a Clinton victory as these demographics were more frequently her voters. The second problem could be the larger reason for the outcome of election. Response bias is termed used when people tells a pollster or purveyor of a survey what they want to hear instead of what they actually believe or will do. The best practice in data collection is to aggregate and analyze data on real actions and behaviors. More specifically, relying on what people say creates a higher likelihood of sampling error.
In politics, there is almost no way to definitely measure actual behavior prior to an election. People do not actually vote for anyone until the election. Even primary voting is not always a good predictor of what will happen in a general election because people are only voting for people in one political party. Even if there was some form of actual vote before an election, voting is done with secret ballots. While all votes should be counted, one should not be able to tie an individual voter to an individual vote. This enables people to vote for whomever they would like without any real or perceived threat of retribution for their actions.
That is not necessarily the case with polling before the election. In particular, a pollster will know whom he / she is contacting and will often speak directly to voters. Coming into this election, voter response bias would likely have favored Clinton. Trump’s policy positions on race, gender, and religion would were often considered to be offensive even to the people that may have ended up voting for him. Therefore, publicly stating a person would vote for Trump would be more difficult than actually voting for him on election. This would mean that what people were telling pollsters was not an accurate reflection of what they actually did on election day, and there is no way to know for sure what percentage of voters used some form of response bias.
One of the many lessons from this campaign season is that survey data can often be unreliable. Yet, senior leaders at many different companies often make decisions and predictions based on survey or focus group data rather than focusing on actual behavior. This introduces the problem of response bias into the strategic decision making processes in ways that can create unexpected outcomes like Trump becoming the President-Elect. Therefore, this election actually is a feature rather than bug when it comes to big data. New technologies do enable us to capture actual behavior more frequently and more accurately than ever before. In fact, the growth of “The Internet of Things” (IOT) is based on the ability for appliances, wearables, and machines to collect actual data. The election should not necessarily sway your point-of-view on analytics or big data. Make sure you have all of the information first before making that decision.