CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE SPRING 2022 DATA SCIENCE BOOT CAMP

Jessica Valenti

clear.png

Roman Holowinsky, PhD

JUNE 08, 2022

DIRECTOR

DATE

TEAM

Discover

Aniket Shah, Robert Baker, Khalida Hendricks, Jessica Valenti

clear.png

Stack Overflow is a website where users can ask and answer questions on a variety of coding topics. Understanding what makes a question likely to be answered would be tremendously helpful knowledge for Stack Overflow users. Using Stack Overflow question and answer data (from Kaggle) and Natural Language Processing, we predicted whether questions would be open or closed based on the words used in the questions. We cleaned the text in each post and extracted important words. These important words were fed into a bag-of-words model and we implemented a logistic regression model to predict if a previously unseen question would be open or closed. Regardless of the training data characteristics, the prediction accuracy was > 50%, meaning that even this simple model can help identify which words will lead to a question receiving satisfactory answers. This information will help Stack Overflow users cut the time they are waiting for a response to their question when trying to troubleshoot code.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL