top of page

TEAM

Natural Language Query Analysis for Genomic Data

Kashvi Srivastava

clear.png

Objective: Build a pipeline to query genomic information using natural language queries. For instance, "Find genes associated with Alzheimer’s disease" is a natural language query.
Methodology:
• Data Processing: Clean and process genomic datasets such as UCSC Genome Browser
• Feature Engineering: Generate feature embeddings for natural language queries
• Neural Network Modeling: Fine-tune a pre-trained LLM using the genomic dataset
• Additional Step: Create an interface for the queries

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

©2017-2025 by The Erdős Institute.

bottom of page