Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE MAY-SUMMER 2024 DATA SCIENCE BOOT CAMP
Craig Franze
Roman Holowinsky, PhD
JUNE 10, 2024
DIRECTOR
DATE
TEAM
Topic recognition on NYT articles
Ravi Tripathi, Touseef Haider, Ping Wan, Schinella D'Souza, Alessandro Malusà, Craig Franze
The project proposes to study metadata of New York Times article to detect most relevant topics and build a recommendation system based on topic similarity.
We plan to do the following:
1) Apply methods like Latent Dirichlet Allocation (LDA) and Bidirectional Encoder Representations from Transformers (BERT) to identify the most relevant topics from a corpus of about 42,000 article published over the last year
2) Draw insightful visuals to highlight topic and word distribution as well as popular trends
3) Use Neural Networks to assign significant labels to topics
4) Create a recommender system based on topic similarity