Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE MAY-SUMMER 2024 DATA SCIENCE BOOT CAMP
Xiaoyu Wang
Roman Holowinsky, PhD
JUNE 10, 2024
DIRECTOR
DATE
TEAM
arXiv Chatbot
Xiaoyu Wang,Ketan Sand,Guoqing Zhang,Tajudeen Mamadou Yacoubou,Tantrik Mukerji
arXiv is the largest open database available containing nearly 2.4 million research papers, spanning 8 major domains covering everything there is to understand from the tiniest of atoms to the entire cosmos. A large language model (LLM) having access to such a dataset will make it unprecedented in generating updated, relevant, and, more importantly, precise information with citable sources.
This is exactly what we have done in this project. We have refined the capabilities of Google’s Gemini 1.5 pro LLM by building a customized Retrieval-Augmented Generation (RAG) pipeline that has access to the entire arXiv database. We then deployed the entire package into an app that mimics a chatbot to make the experience user-friendly.