Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2024 DEEP LEARNING BOOT CAMP
Sayantan Roy
Roman Holowinsky, PhD
MAY 03, 2024
DIRECTOR
DATE
TEAM
Pawsitive Retrieval 1
Marcos Ortiz, Kristina Knowles, Diptanil Roy, Karthik Prabhu Palimar, Sayantan Roy
This project aims to build a model to efficiently identify and rank relevant content from a large dataset of human-generated Reddit posts (5.5 million posts from 34 different subreddits), given an arbitrary user query. The key objectives were to retrieve highly relevant results for queries while keeping retrieval times under 1 second. The long-term application is to use this capability as part of a Retrieval-Augmented Generation (RAG) pipeline for Aware clients.
We focus on systematically varying parameters of our embedding model, as well as applying different filters (before retrieval) and rerankings (after retrieval) that leverage the relationships inherent in the structure of the data.
Using these strategies, we successfully improve the placement of relevant results retrieved according to several modified recommender system metrics. These metrics were implemented using a set of over 1000 human labeled query-result pairs establishing a set of known relevant results for 25 queries.