Aniket Shah, Robert Baker, Khalida Hendricks, Jessica Valenti


Stack Overflow is a website where users can ask and answer questions on a variety of coding topics. Understanding what makes a question likely to be answered would be tremendously helpful knowledge for Stack Overflow users. Using Stack Overflow question and answer data (from Kaggle) and Natural Language Processing, we predicted whether questions would be open or closed based on the words used in the questions. We cleaned the text in each post and extracted important words. These important words were fed into a bag-of-words model and we implemented a logistic regression model to predict if a previously unseen question would be open or closed. Regardless of the training data characteristics, the prediction accuracy was > 50%, meaning that even this simple model can help identify which words will lead to a question receiving satisfactory answers. This information will help Stack Overflow users cut the time they are waiting for a response to their question when trying to troubleshoot code.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL