Authors: Danny Ebanks. Bojan Tonguz. R. Michael Alvarez
Medium: at Trustworthy Social Media
The American public increasingly finds itself bitterly divided over political differences. Survey indicators, partisan media, and the public’s voting patterns inform this sense of division in our politics. That said, we use applications of Machine Learning and Natural Language Processing (NLP) methods in a novel way to paint a more nuanced picture of divisions in American political opinions.
It turns out that even very simple NLP methods that rely on simple word frequencies in politicians’ tweets can be extremely predictive when it comes to predicting party affiliation, getting over 80% accuracy without any special tuning. These simple models are very robust: a model trained on the tweets from the House of the Representatives can be equally predictive when tested on the tweets from the US Senators. Furthermore, even though politicians of both major political parties have become increasingly partisan and homogeneous in terms of their ideological commitments, these simple models can produce very credible rankings of the politicians along the left-right spectrum. These rankings very closely replicate similar rankings that have been based on politicians voting records. If these results withstand more rigorous scrutiny, NLP methods on politicians’ tweets could become a very simple measuring tool for ascertaining the degree of partisanship for any politician, even those with very meager or nonexistent voting records. In these settings, even simple assumptions on the structure of the topics yields large gains in understanding political polarization amongst political elites.