Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2022 DATA SCIENCE BOOT CAMP
Khalida Hendricks
Roman Holowinsky, PhD
JUNE 08, 2022
DIRECTOR
DATE
TEAM
Discover
Aniket Shah, Robert Baker, Khalida Hendricks, Jessica Valenti
Stack Overflow is a website where users can ask and answer questions on a variety of coding topics. Understanding what makes a question likely to be answered would be tremendously helpful knowledge for Stack Overflow users. Using Stack Overflow question and answer data (from Kaggle) and Natural Language Processing, we predicted whether questions would be open or closed based on the words used in the questions. We cleaned the text in each post and extracted important words. These important words were fed into a bag-of-words model and we implemented a logistic regression model to predict if a previously unseen question would be open or closed. Regardless of the training data characteristics, the prediction accuracy was > 50%, meaning that even this simple model can help identify which words will lead to a question receiving satisfactory answers. This information will help Stack Overflow users cut the time they are waiting for a response to their question when trying to troubleshoot code.