top of page
STakacs_2019.Hackathon_730_1920.web (1).jpg

PROJECT CHALLENGES

Sponsor a 4 - 6 week, real-world project challenge tackled by top-tier PhD talent in Data Science, Deep Learning, UX Research, Quantum Computing, & Quant Research/Finance.

Let our teams work on your problem while you meet future hires along the way.

or

HOW IT WORKS

Submit a Project

Tell us about a project or challenge you’d like our boot camp teams to work on. Our technical team will work with you to scope the problem and align it with an upcoming cohort.  

PhDs Work in Teams

Over the course of 4 - 6 weeks, teams of 3-5 participants dive into the project. Each team is matched with a mentor, either from your company or from our internal Erdős PhD alumni network.

Attend Demo Day

You’ll receive access to final project videos, executive summaries, and annotated GitHub repos. Projects culminate in a demo day featuring all of the teams that worked on your challenge.

WHY SPONSOR A CHALLENGE?
  • Fresh, Insightful Work
    Gain new perspectives on business-critical challenges.

     

  • Work with Top Talent
    Engage with skilled PhDs transitioning to careers in Data Science, ML, AI, Deep Learning, UX Research, and more. Each cohort the Erdős Institute attracts over 500 PhDs from some of the world's top universities.

     

  • Flexible Mentorship Options
    You can provide a mentor from your team or we’ll assign one from our side.

     

  • Multiple Teams for Maximum Impact
    Each $5,000 sponsorship covers up to 4 teams working on your project.

     

  • End-to-End Deliverables
    Get access to final project videos, GitHub repos, executive summaries, and more.

COHORT SCHEDULE

📅   2025 Boot Camps & Deadlines

Cohort

Cohort Start Date

Submit Project By *

Spring 2025

January 22, 2025

January 15, 2025

Summer 2025

May 7, 2025

April 30, 2025

Fall 2025

September 10, 2025

September 3, 2025

* Projects should be scoped and submitted before the cohort start date. We recommend submitting at least 2 weeks in advance to ensure alignment and onboarding.

PAST PROJECT SPOTLIGHTS

Examples of projects from prior cohorts

SPRING 2026

Quantum Computing Boot Camp

Ege Aktener - Quantum Computing - Final Project

Ege Aktener - Quantum Computing - Final Project
github URL

Ege Aktener

Project 1: An Oracle for Shor's Algortihm (https://github.com/egeaktener/Oracle-for-Shor-s-Algorithm)
Takes two co-prime integers a and N, and outputs a quantum gate for modular multiplication.

Project 2: Quantum PageRank (https://github.com/egeaktener/Quantum-PageRank)
given a directed graph, computes classical and quantum PageRank by simulating classical and quantum random walks.

SPRING 2026

Quantum Computing Boot Camp

Gautham (Sid) Meka

Gautham (Sid) Meka
github URL

Gautham Meka

SPRING 2026

UX Research Boot Camp

2026 Cohort Team

2026 Cohort Team
github URL

Tamera Jones, David Stifler, Ibrahim Odugbemi, Dejan Duric

Project 1: Competitive Analysis
We conducted market research on behalf of Kairos, a remote patient monitoring company focused on postpartum health. Through background research and data synthesis, we strategically redesigned the company's marketing position.

Project 2: Finance App Usability
The goal-tracking app, Stack Save, was tested for feature usability with 4 users, focusing on areas for improvement in ease of use and effectiveness. We conducted a usability testing session via a survey and redesigned the interface based on the suggestions.

Project 3: Customer Loyalty
TableGram, a popular food service app, has seen changes in customer loyalty. To understand the key drivers and barriers in customer loyalty, we designed and administered a survey and conducted A/B testing to propose an intervention to keep customers.

SPRING 2026

Quant Finance Boot Camp

Gautam Hegde : Kou Jump-Diffusion Model

Gautam Hegde : Kou Jump-Diffusion Model
github URL

Gautam Hegde

Kou Jump-Diffusion Model
An extension of the Black-Scholes framework incorporating sudden, large discontinuities in asset prices via a double-exponential jump process. The project consists of two modules.
The first is a pricing and calibration engine for European options, combining the Carr-Madan FFT method with 2D interpolation for efficient price computation, and L-BFGS-B optimization to fit model parameters to market data.
The second is a Monte Carlo simulation analyzing delta-hedging performance under jump risk. By evolving a hedged European call option portfolio under the Kou model, the simulation demonstrates that jumps induce fat tails in the P&L distribution that delta-hedging alone cannot eliminate.

SPRING 2026

Quant Finance Boot Camp

The Volatility Complex: Topological Analysis of Sector Stock Volatility via Vietoris-Rips Complexes

The Volatility Complex: Topological Analysis of Sector Stock Volatility via Vietoris-Rips Complexes
github URL

Andrew Tawfeek

We study how the volatility of ~50 S&P 500 tech stocks co-moves over time (2007–2024) using tools from topological data analysis. From rolling volatility correlations, we build a distance metric between stocks and construct Vietoris-Rips simplicial complexes — geometric objects that capture not just pairwise relationships but higher-order group behavior. Tracking the complex over time reveals sharp structural shifts during crises (2008, COVID, 2022) when all stocks move as one, and fragmentation into sub-sector clusters during calm periods. Persistent homology provides a multi-scale view of this structure, identifying robust features across different correlation thresholds.

SPRING 2026

Deep Learning Boot Camp

Cross-Dataset Generalization of Underwater Instance Segmentation Models

Cross-Dataset Generalization of Underwater Instance Segmentation Models
github URL

Carsten Sprunger

Underwater instance segmentation models are typically trained and evaluated on a single dataset, leaving cross-dataset generalization unstudied. This project measures the cross-dataset domain gap between TrashCan (7,212 deep-sea ROV images, 22 classes) and SeaClear (8,610 shallow-water images, 40 classes) using Mask R-CNN with a COCO-pretrained ResNet-50 FPN backbone. Models trained on one dataset fail catastrophically on the other despite overlapping object categories. We show this gap is visual, not semantic: silhouette analysis of backbone features reveals strong dataset clustering even at the per-class level. To mitigate the gap, we pool both datasets into common category spaces via a generic coarsening hierarchy and show that a single pooled model recovers or exceeds in-domain performance on both test sets. We also re-split TrashCan using frame-chunking to fix data leakage in the original split. All results are explorable via an interactive Streamlit dashboard.

SPRING 2026

UX Research Boot Camp

Team JXY!

Team JXY!
github URL

Yuxian Lin,Xinyue Wu,Jessie Cordwell

Project 1 – Market Research & Strategy
Conducted market research and competitive analysis to understand the industry landscape, followed by a SWOT analysis to identify strategic strengths and weaknesses, and designed a case study to apply these insights in a real-world context.

Project 2 – User Research & Design
Developed user personas to define target users, conducted usability testing based on personas to evaluate design effectiveness, and created wireframes to translate research findings into early-stage interface concepts.

Project 3 – Quantitative Research
Designed a survey to collect user data, ran an A/B test to compare new product feature, and performed quantitative analysis to draw data-driven conclusions from the results.

SPRING 2026

Deep Learning Boot Camp

Spatiotemporal Modeling of Pose Estimation in Wearables

Spatiotemporal Modeling of Pose Estimation in Wearables
github URL

Sero Parel, Kristin Dona, Dayoung Lee, Brian Mullen

This project aims to build a deep learning pipeline for hand pose estimation from surface electromyography (sEMG) signals recorded by a smart wristband equipped with muscle activity sensors. We used the emg2pose dataset, which includes data from 193 users, 370 hours, 16-channel sEMG signals at 2 kHz (Salter, Warren, Schlager, et al. 2024). This publicly available dataset is found in the GitHub repository: https://github.com/facebookresearch/emg2pose.

We focused on the core deployment plan, generalization to new users/poses, sensor placements, and trajectory quality. We established a baseline LTSM model and added small, well-ablated improvement through an spatiotemporal learning approach. This project is packaged as a reproducible PyTorch pipeline that can be run in Google Colab. Additionally, we included deployment by publishing our trained model checkpoints and inference code to Hugging Face.

SPRING 2026

Deep Learning Boot Camp

Deep Learning Song Recommender

Deep Learning Song Recommender
github URL

Nick Geis, Mitch Hamidi-Ismert, Juan Salinas

This project develops a content-based music recommender that predicts song relationships from audio, using listener-generated tags as supervision during training. From 10-second clips, stem separation and mel spectrograms are used to represent each track, and a late-fusion ResNet18 learns embeddings that capture genre, mood, and musical structure. At inference time, the system recommends songs from audio alone through an interactive web app, showing how deep learning can support music discovery without relying solely on user behavior.

SPRING 2026

Data Science Boot Camp

A Solar-to-Ground Proxy Model for Ground-Level Electromagnetic (EM) Risk Prediction

A Solar-to-Ground Proxy Model for Ground-Level Electromagnetic (EM) Risk Prediction
github URL

George Seelinger

Geomagnetic storms driven by solar activity pose risks to power grids, satellites, and communication systems. In this project, we developed a data-driven solar-to-ground proxy model that predicts near-term geomagnetic activity using solar wind data. The final model is an XGBoost Classifier tuned to maximize correctly predicting when a storm occurs subject to keeping the false positive rate at an acceptable level.

SPRING 2026

Deep Learning Boot Camp

EllipticGuard: Graph Deep Learning for Bitcoin Illicit Activity Detection

EllipticGuard: Graph Deep Learning for Bitcoin Illicit Activity Detection
github URL

Ran Li, Shaoyang Zhou, Rafael Miksian Magaldi, Prakash Singh, Tinghao Huang

This project studies illicit Bitcoin transaction detection on the Elliptic dataset under a stable pre-shutdown split (train 1–32, val 33–37, test 38–42). We compared strong tabular baselines, GNNs, graph-aware non-neural models, compressed graph–tree hybrids, directed residual GNNs, and combination models. Generic GNNs improved over weaker graph baselines but remained below the best tabular model. A graph-aware ET stack using directed neighbor-risk aggregates reached 0.905 test PR-AUC, while compressed hybrid models showed that GNN embeddings help more when constrained through low-dimensional bottlenecks, including Matryoshka-style designs, before integration into trees. The best standalone graph models were directed residual GNNs (up to 0.916), and the top result, 0.9187, came from a preserved-head combination model integrating GraphAgg ET with SIGN/stack components. Overall, graph information helps most when integrated with tabular models rather than used through a standalone GNN.

SPRING 2026

Deep Learning Boot Camp

Fragmented ID Resolution

Fragmented ID Resolution
github URL

Noimot Bakare Ayoub, Dharineesh Somisetty, Arpith Shanbhag, Pedro Fontanarrosa

Scope: Detect duplicate identities across noisy, fragmented datasets (fraud, patient mismatch, citizen records)
Architecture: CNN Embeddings + Siamese Network

Problem: Real-world identity data is messy, small inconsistencies cause one person to appear as multiple records, creating operational risk and inefficiency.

Approach: We learn record similarity using CNN embeddings and a Siamese network. LinkID detects, ranks, and resolves duplicate identities auto-linking high-confidence matches and routing borderline cases for review.

Data: HPI snapshot of North Carolina voter records with labeled duplicate and non duplicate pairs.

Results: Strong performance overall, with ~25-point improvement on hard cases where traditional models struggle.

Conclusion: Learned similarity models significantly outperform traditional approaches in complex identity resolution tasks.

SPRING 2026

Data Science Boot Camp

LLM Hallucinations Detector

LLM Hallucinations Detector
github URL

Helmut Wahanik, Guoqin Liu, Santanil Jana, AJ Vargas, Debanjan Sarkar

In this project, we develop methods for detecting hallucinations in Large Language Models (LLMs) to flag risky outputs prior to expensive downstream validation. We propose two complementary detection strategies evaluated on 2,500 questions across five benchmark datasets using Llama-3.2-3B. The first approach is a white-box method that extracts spectral features from attention-head Laplacians. This method demonstrates that the hallucination signal is low-dimensional and largely linearly separable. The second approach is a black-box method that computes semantic and geometric statistics from a cloud of sampled responses. We find that an ElasticNet logistic model trained on six baseline features achieves an AUROC of approximately 0.91.

Ultimately, we demonstrate that hallucinations leave measurable signatures in both internal transformer activations and the geometry of sampled outputs. Our approach serves as a cost-effective filter for organizations deploying LLMs at scale.

SPRING 2026

Data Science Boot Camp

Hitmakers vs. One-Hit Wonders: Predicting Sustained Success in the Music Industry

Hitmakers vs. One-Hit Wonders: Predicting Sustained Success in the Music Industry
github URL

James McNally,Yundi Kong,Guillermo Sanmarco,Vishal Gupta

Question:
What early signals predict sustained success in the music industry?

Objective:
Many musicians produce hit songs, but not all are able to do so more than once. This project builds a machine learning classifier to distinguish hitmakers (artists with multiple top 20 Billboard Hot 100 hits) from one-hit wonders, using only information available at the moment of a musician’s first top 20 hit song.

Conclusions:
Our model reveals that prior charting experience, collaboration network position, chart longevity, genre breadth, and dominant genre affiliations are the strongest predictors of sustained success.

Data sources:
- MusicBrainz (artist metadata, genre tags, collaboration graph)
- Billboard Hot 100 & 200 chart data
- Spotify (artist and song metadata)
- Google Trends (relative search volume at time of first hit song)

SPRING 2026

Data Science Boot Camp

Mapping Radon Risk in Canada

Mapping Radon Risk in Canada
github URL

Huiyao Kuang, Manimugdha Saikia, Emmanuel Asante, John Berezney

Radon is a naturally occurring radioactive gas, and long-term exposure to elevated indoor radon is a major public health concern in Canada. Because radon is colorless and odorless, exposure often goes unnoticed, while risk varies substantially across regions due to differences in geology, climate, housing, and socioeconomic context. This project developed an FSA-level radon risk screening framework for Canada by integrating household radon survey data with public geological, climatic, housing, socioeconomic, and uranium-related datasets, and identified the regional factors most strongly associated with elevated radon risk.

SPRING 2026

Data Science Boot Camp

Towards Automated Sleep Analysis: Stage Classification and Apnea Prediction

Towards Automated Sleep Analysis: Stage Classification and Apnea Prediction
github URL

Ye Hong,Vaibhav Thakur,Aiqi Cheng

Accurate sleep monitoring remains a challenge for consumer wearables, and conditions like Obstructive Sleep Apnea are widely underdiagnosed due to the lack of accessible, reliable automated tools. Using the DREAMT dataset — combining overnight PSG signals, Empatica E4 smartwatch data, and subject metadata from 100 participants — we built two models: one to classify sleep stages in real time, and one to predict apnea events 10 seconds in advance.
For sleep stage classification, we benchmarked Logistic Regression, XGBoost, LightGBM, and LSTM; gradient boosting methods performed best, with an XGBoost and LightGBM ensemble further improved by majority-vote smoothing. For apnea prediction, a longer lag window of LightGBM features yielded the strongest results, highlighting the importance of temporal context.
These models enabled real-time stage tracking and proactive apnea alerts with potential for earlier clinical interventions.

FALL 2025

Quant Finance Boot Camp

Implied Volatility vs. Realized Volatility for an Africa-Exposure ETF

Implied Volatility vs. Realized Volatility for an Africa-Exposure ETF
github URL

Chidubem Umeh

As a Nigerian-American, I have a personal interest in understanding African financial markets—particularly those in West Africa, where local economic factors often differ significantly from continental trends.

This project will focus on volatility modeling and forecasting using real market data. The “West Africa Regime” component refers specifically to periods of heightened Nigerian Naira (NGN) volatility, which can be used as a proxy for broader West African macroeconomic uncertainty.

FALL 2025

Deep Learning Boot Camp

Deep Learning Models for Colorectal Polyp Detection

Deep Learning Models for Colorectal Polyp Detection
github URL

Ruibo Zhang, Rebekah Eichberg, Betul Senay Aras, Kevin Specht, Arthur Diep-Nguyen

A polyp is an abnormal tissue growth in the large intestine that is typically benign but can develop into malignant colorectal cancer. Colonoscopy enables endoscopists to identify and assess these polyps for potential removal. However, the accuracy of this procedure depends heavily on the clinician’s expertise, making it prone to human error and variability. Our goal is to build a deep-learning model that detects colorectal polyps in images from colonoscopies to minimize missed lesions and improve patient outcomes.

FALL 2025

Data Science Boot Camp

Identifying Early Risk Factors for Students in Online Courses

Identifying Early Risk Factors for Students in Online Courses
github URL

James McNally,James Caramanico,Arina Favilla,Feng Zhu

Research Question: What early engagement patterns in virtual learning environments predict negative course outcomes?

Context: It is well known that performance on assessments and in-class attendance are predictive of final course results. Yet grades often come too late in a class term for early interventions and attendance is difficult to measure in online learning environments. To address this gap, we developed a model for identifying early risk factors in online courses based on student interaction patterns in a virtual learning environment (VLE).

Data source: Open University Learning Analysis Dataset (OULAD), which includes daily logs of UK student VLE interactions and grades in 7 science and social science online courses occurring in 2013-14.

Goal: Develop a model for identifying early risk factors based on student interaction patterns that predict negative course outcomes (i.e., failure or withdrawal) in a VLE.

FALL 2025

Data Science Boot Camp

Personalized Gesture Recognition

Personalized Gesture Recognition
github URL

Sero Parel, Carrie Clark, Brian Mullen, Philip Nelson, Revati Jadhav

Smart wristbands enable users to control technology through subtle hand gestures by decoding muscle signals. However, each individual's muscle signals are unique, making personalization a critical challenge. Leveraging a publicly available dataset (Kaifosh et al. 2025), our team developed a personalized gesture recognition model using surface electromyography (sEMG) data from 100 participants performing 9 gestures. Addressing inter-user variability through within-user training, we engineered 160 features and selected 37 via random forest ranking and correlation pruning. Logistic regression with L2 regularization achieved strong cross-validation performance (F1 Macro = 0.7164), but holdout testing revealed a generalization gap (F1 Macro = 0.3977). Performance varied widely, confirming heterogeneity in performance across diverse users. Future work could explore adaptive time windows and fine-tuning pre-trained models to enable more robust commercial neuromotor interfaces.

SUBMIT A CHALLENGE

Provide us with some initial information about your project idea. It doesn't need to be complete. Our technical team will contact you after you submit your project challenge to scope out the project details and align it with an upcoming cohort.

Are you providing data as part of this challenge?
BOOK A CALL

If you would like to discuss project sponsorship options before submitting a challenge, then please fill out the form below to schedule a zoom call with out technical team.

©2017-2026 by The Erdős Institute.

bottom of page