
HOW IT WORKS
Submit a Project
Tell us about a project or challenge you’d like our boot camp teams to work on. Our technical team will work with you to scope the problem and align it with an upcoming cohort.
PhDs Work in Teams
Over the course of 4 - 6 weeks, teams of 3-5 participants dive into the project. Each team is matched with a mentor, either from your company or from our internal Erdős PhD alumni network.
Attend Demo Day
You’ll receive access to final project videos, executive summaries, and annotated GitHub repos. Projects culminate in a demo day featuring all of the teams that worked on your challenge.
WHY SPONSOR A CHALLENGE?

-
Fresh, Insightful Work
Gain new perspectives on business-critical challenges.
-
Work with Top Talent
Engage with skilled PhDs transitioning to careers in Data Science, ML, AI, Deep Learning, UX Research, and more. Each cohort the Erdős Institute attracts over 500 PhDs from some of the world's top universities.
-
Flexible Mentorship Options
You can provide a mentor from your team or we’ll assign one from our side.
-
Multiple Teams for Maximum Impact
Each $5,000 sponsorship covers up to 4 teams working on your project.
-
End-to-End Deliverables
Get access to final project videos, GitHub repos, executive summaries, and more.
COHORT SCHEDULE
📅 2025 Boot Camps & Deadlines
Cohort
Cohort Start Date
Submit Project By *
Spring 2025
January 22, 2025
January 15, 2025
Summer 2025
May 7, 2025
April 30, 2025
Fall 2025
September 10, 2025
September 3, 2025
* Projects should be scoped and submitted before the cohort start date. We recommend submitting at least 2 weeks in advance to ensure alignment and onboarding.
PAST PROJECT SPOTLIGHTS
Examples of projects from prior cohorts
FALL 2025
Quant Finance Boot Camp
Implied Volatility vs. Realized Volatility for an Africa-Exposure ETF
Chidubem Umeh
As a Nigerian-American, I have a personal interest in understanding African financial markets—particularly those in West Africa, where local economic factors often differ significantly from continental trends.
This project will focus on volatility modeling and forecasting using real market data. The “West Africa Regime” component refers specifically to periods of heightened Nigerian Naira (NGN) volatility, which can be used as a proxy for broader West African macroeconomic uncertainty.
FALL 2025
Deep Learning Boot Camp
Deep Learning Models for Colorectal Polyp Detection
Ruibo Zhang, Rebekah Eichberg, Betul Senay Aras, Kevin Specht, Arthur Diep-Nguyen
A polyp is an abnormal tissue growth in the large intestine that is typically benign but can develop into malignant colorectal cancer. Colonoscopy enables endoscopists to identify and assess these polyps for potential removal. However, the accuracy of this procedure depends heavily on the clinician’s expertise, making it prone to human error and variability. Our goal is to build a deep-learning model that detects colorectal polyps in images from colonoscopies to minimize missed lesions and improve patient outcomes.
FALL 2025
Data Science Boot Camp
Identifying Early Risk Factors for Students in Online Courses
James McNally,James Caramanico,Arina Favilla,Feng Zhu
Research Question: What early engagement patterns in virtual learning environments predict negative course outcomes?
Context: It is well known that performance on assessments and in-class attendance are predictive of final course results. Yet grades often come too late in a class term for early interventions and attendance is difficult to measure in online learning environments. To address this gap, we developed a model for identifying early risk factors in online courses based on student interaction patterns in a virtual learning environment (VLE).
Data source: Open University Learning Analysis Dataset (OULAD), which includes daily logs of UK student VLE interactions and grades in 7 science and social science online courses occurring in 2013-14.
Goal: Develop a model for identifying early risk factors based on student interaction patterns that predict negative course outcomes (i.e., failure or withdrawal) in a VLE.
FALL 2025
Data Science Boot Camp
Personalized Gesture Recognition
Sero Parel, Carrie Clark, Brian Mullen, Philip Nelson, Revati Jadhav
Smart wristbands enable users to control technology through subtle hand gestures by decoding muscle signals. However, each individual's muscle signals are unique, making personalization a critical challenge. Leveraging a publicly available dataset (Kaifosh et al. 2025), our team developed a personalized gesture recognition model using surface electromyography (sEMG) data from 100 participants performing 9 gestures. Addressing inter-user variability through within-user training, we engineered 160 features and selected 37 via random forest ranking and correlation pruning. Logistic regression with L2 regularization achieved strong cross-validation performance (F1 Macro = 0.7164), but holdout testing revealed a generalization gap (F1 Macro = 0.3977). Performance varied widely, confirming heterogeneity in performance across diverse users. Future work could explore adaptive time windows and fine-tuning pre-trained models to enable more robust commercial neuromotor interfaces.
FALL 2025
Data Science Boot Camp
Predicting Lead Contamination in NY School Drinking Water
Ranadeep Roy,Cami Goray,Hana Lang
Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.
SUMMER 2025
Deep Learning Boot Camp
Going Off-Grid: A Computer Vision Approach for Grid Integration and Reconstruction in Post-War Syria
Al Baraa Abd Aldaim,Suman Bhandari,Nicholas Geiser
Decentralized solar electricity production has become common in Syria due to unreliability and inconsistent delivery from the national electric grid. Estimating output from decentralized solar is vital for grid integration and reconstruction efforts currently underway in Syria. Our goal in this project was to develop a deep learning model capable of detecting panels (bounding box), estimating their area (segmentation), and predicting the bottom corners of the panel assembly (corner prediction).
SUMMER 2025
Deep Learning Boot Camp
Fraud Detection with Deep Learning
Jude Pereira, Yang Yang, Adrian Wong, Sara Edelman-Munoz, Mary Reith
Fraud detection is a critical area where deep learning has been effectively applied to identify and prevent unauthorized transactions, money laundering, and other financial crimes. Traditional rule-based systems and statistical models often struggle to detect sophisticated fraud patterns, particularly when dealing with large volumes of data and rapidly evolving fraud techniques. In contrast, deep learning models, such as CNNs, RNNs, and autoencoders, have proven highly effective in analyzing complex, high-dimensional transaction data and detecting subtle, non-linear patterns indicative of fraudulent activity.
In this project, we build a User ID-based fraud detection model using autoencoders, trained on unlabelled real-world credit card transaction data, capable of detecting fraud with a precision of up to 35% and a recall of up to 72%, performing significantly better than traditional ML/statistical baseline models..
SUMMER 2025
Deep Learning Boot Camp
FSP Finder
Duncan Clark,Elzbieta Polak,Jared Able,Shuo Yan
NOTE: Available at www.fspfinder.com (HF link is deactivated)
FSP (Foul Speech Pattern) Finder is a useful tool for preparing music files for radio airplay by detecting and automatically censoring explicit content. We use a custom version of OpenAI's automatic speech recognition model Whisper, which we fine tuned on over 100 hours of music vocals, to transcribe uploaded music files (with timestamps for each word). We then search for common explicit words (e.g., curse words, racial slurs, etc.) in the transcript. The vocals stem of the track is separated using demucs, then muted at the identified times, to produce a high quality radio-friendly version of the uploaded track(s).
Our tool comes with an easy to use web interface built in Gradio. The tool can process files one at a time or in batches, and the web interface allows the user to view the full transcript of each track along with the words that will be censored, before downloading the edited files.
SUMMER 2025
Data Science Boot Camp
Machine Learning Magnetism
Ahmed Abdelazim, Murod Mirzhalilov, Brandon Abrego, Sayok Chakravarty
Strong electron correlations often lead to emergent magnetic behavior in materials. Predicting such magnetic properties is essential for advancing technologies in spintronics, data storage, and quantum computing. However, traditional methods - whether experimental techniques or density functional theory (DFT) calculations - are often complex, time-consuming, or unreliable in strongly correlated systems. This project aims at building machine learning models to predict the magnetic ordering of inorganic compounds using chemical, structural, electronic, and thermodynamic descriptors. By leveraging existing materials databases (The Materials Project + Bilbao Crystallographic Server MAGNDATA), our goal is to build a ML model that offers a faster, data-driven alternative for accelerating the discovery and design of novel magnetic materials. Our results represent a step forward in tackling the grand challenge of magnetism.
SUMMER 2025
Data Science Boot Camp
WikiShield: Guarding against vandalism on Wikipedia
Samarth Chawla, Daniel Milanes Perez, Paul Spears, Zijian Rong, Zihao Fang
Despite being open for editing by anyone, Wikipedia tends to be fairly reliable for a first pass on many topics. Its openness to editors is its greatest strength, but also its greatest vulnerability. Intentionally disruptive and malicious edits can be a nuisance for unsuspecting readers who may come across nonsensical sentences inserted by bad actors. These edits can also pollute downstream platforms (such as search engines) that may rely on Wikipedia to generate short summaries of relevant information.
Reverting these edits often falls to volunteer (human) editors. The aim of our project, "WikiShield," was to produce a machine learning model designed to quickly detect vandalism edits on Wikipedia so that the effort of human volunteer editors is not wasted reverting low-effort attempts at vandalism.
SUMMER 2025
Data Science Boot Camp
Predicting Yearly Science Fiction/Fantasy Awards
Zach Raines, Rohan Nair
There are a number of major awards are given to the ‘best’ new Science and Fiction fantasy novels each year, such as the Hugo, Nebula, and World Fantasy awards. Predicting which books might win is made especially difficult by the paucity of publicly available sales and review data as a function of time.
In this project we constructed a predictive model to select award winners from a pool of nominees based on publicly available information about the books and a proxy topicality score describing world state and zeitgeist for the year of publication.
SUMMER 2025
Data Science Boot Camp
Safeify: A Quality and Safety Metric
Emelie Arvidsson, Alex Margolis, Rebekah Eichberg, Betul Senay Aras
The Safeify project was motivated by the need to find a better quality and safety metric for online consumers as product ratings are not that reliable due to bots and paid reviewers. Also, there is no known metric or model that flags safety concerns such as recalls and multiple incident reports of products. The goal of the Safeify model is to help consumers by predicting unsafe and poor quality products that could lead to dissatisfaction.
SUMMER 2025
Data Science Boot Camp
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi
Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
SPRING 2025
Data Science Boot Camp
Today's Texas Might be Tomorrow's Ohio: Building a Geographic Climate Change Predictor
David Pochik, Alison Duck, Tawny Sit, Jack Neustadt
From the dawn of industrialization to today, the average global temperature has shifted upward by ~2.7 degrees Fahrenheit (~1.5 degrees Celsius) due to increased greenhouse gas emissions. If emissions are left unchecked and temperatures continue to rise at their current (or projected) rate, then this will lead to drastic shifts in regional climate. For example, today's annual average temperature in Ohio will increase to that of today's annual temperature in Texas in Y years.
This project explores and analyzes geographical climate change data in the contiguous United States from 1950 to the current year. The objective is to predict regional features, e.g., temperature, precipitation, snowfall, etc., for a given year based on historical data, i.e., if I want to live in an area Y years from now that has roughly the same temperature or climate as region X today, where would I go?
SPRING 2025
Data Science Boot Camp
Who Regulates the Regulators?
Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser
With recent major cuts to governmental regulation agencies in the US, we investigate whether those cuts are justified. In particular, we analyze the efficacy of RGGI, a state-level cap-and-trade program designed to regulate CO2 emissions in power plants. By using synthetic controls, we answer the counterfactual question: "how would CO2 emissions look in a world where RGGI was never enacted?".
SPRING 2025
Data Science Boot Camp
Discovering Next-Gen Battery Materials
Dorisa Tabaku, Avinash Karamchandani, Qinying Chen, Sadisha Nanayakkara, FNU Simran
Building the next generation of batteries—efficient, compact, and sustainable—relies on discovering new materials with the right set of properties. Metal-organic frameworks (MOFs), a class of crystalline and porous materials, have emerged as promising candidates for battery electrodes due to their potential for electrical conductivity. One key property that influences a MOF’s conductivity is its band gap. However, state-of-the-art density functional theory (DFT) calculations used to compute band gaps are computationally expensive. In this project, our goal was to develop a machine learning model to predict the band gaps of MOFs, helping to rapidly identify promising candidates for future energy storage technologies such as next-generation batteries.
SPRING 2025
Data Science Boot Camp
Predicting Power Outages
Aaron Weinberg, Evan Morris, Anna Zuckerman, Julio Caceres Gonzales
From ThinkOnward (https://thinkonward.com/app/c/challenges/dynamic-rhythms):
"In this challenge, you'll be tasked with developing a model to predict power outages and how they correlate with extreme, rare weather events (e.g. storms). Your goal is to create a reliable system that can accurately predict these outages. You'll have access to a dataset containing historical weather data and relevant power outages. Your task is to design a model that can effectively forecast future weather impacts on power outages. You're free to explore and experiment with various algorithms, techniques, and models to achieve accurate results. To make things more interesting, we've identified two primary datasets: a storm event dataset and a power outage dataset. These dual datasets will require you to develop a robust model that can adapt to different scenarios and provide accurate forecasts."
SPRING 2025
Data Science Boot Camp
Predicting Survival Time After Bone Marrow Transplant
Ruibo Zhang, Chi-Hao Wu, Yang Li, Ray Karpman, Elzbieta Polak
A blood and marrow transplant is a procedure that replaces unhealthy blood-forming cells with healthy ones. It typically involves using blood-forming cells donated by someone else instead of one's own blood-forming cells. The goal of this project is to predict transplant survival rates for post Bone Marrow Transplant patients.
We implemented and finetuned four models including Cox Proportional Hazard Model, XGBoost AFT, Survival Random Forest, and CatBoost AFT. To improve model performance, we hybridized each of the four models with an extra logistic or random forest stratification.
Our dataset comes from a Kaggle competition: https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/.
FALL 2024
Data Science Boot Camp
Predicting Problematic Internet Use
Daniel Visscher, Emilie Wiesner, Aaron Weinberg
Internet use has been identified by researchers as having the potential to rise to the level of addiction, with associated increased rates of anxiety and depression. Identifying cases of problematic internet usage currently requires evaluation by an expert, however, which is a significant impediment to screening children and adolescents across society. One potential solution is to rely on data that is more easily and uniformly collected: the kind collected by a family physician, a simple survey, or by a smartwatch. The research question this project sets out to answer is: “Can we predict the level of problematic internet usage exhibited by children and adolescents, based on their physical activity and survey responses?”
SUBMIT A CHALLENGE
Provide us with some initial information about your project idea. It doesn't need to be complete. Our technical team will contact you after you submit your project challenge to scope out the project details and align it with an upcoming cohort.
BOOK A CALL
If you would like to discuss project sponsorship options before submitting a challenge, then please fill out the form below to schedule a zoom call with out technical team.
