Reza Averly, Adnan Mahmood, Nikhil Ajgaonkar, Aniket Joshi, Lisa Berger


We analyze written transcription of presidential and vice presidential debates from 1992 to 2020. Our aim is to determine parlance specific to U.S. Democratic and Republican parties. Using a machine learning classification model, we use this determination to classify words and phrases according to party affinity as per the predicted probability. Given an out-of-sample set of words and phrases, our algorithm is capable of classifying whether the words or phrases are favored by the Democratic or the Republican Party. Our algorithm shows an accuracy of 70% and above for all the election terms since 1992. To take care of the imbalance in the data set, we optimize our algorithm to derive custom thresholds by maximizing the F1 score. This algorithm can be used by politicians for constructing and promoting campaign platforms as well as by independent lobbyists targeting proposals to either party.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL