top of page
STakacs_2019.Hackathon_730_1920.web (1).jpg

PROJECT CHALLENGES

Sponsor a 4 - 6 week, real-world project challenge tackled by top-tier PhD talent in Data Science, Deep Learning, UX Research, Quantum Computing, & Quant Research/Finance.

Let our teams work on your problem while you meet future hires along the way.

or

HOW IT WORKS

Submit a Project

Tell us about a project or challenge you’d like our boot camp teams to work on. Our technical team will work with you to scope the problem and align it with an upcoming cohort.  

PhDs Work in Teams

Over the course of 4 - 6 weeks, teams of 3-5 participants dive into the project. Each team is matched with a mentor, either from your company or from our internal Erdős PhD alumni network.

Attend Demo Day

You’ll receive access to final project videos, executive summaries, and annotated GitHub repos. Projects culminate in a demo day featuring all of the teams that worked on your challenge.

WHY SPONSOR A CHALLENGE?
  • Fresh, Insightful Work
    Gain new perspectives on business-critical challenges.

     

  • Work with Top Talent
    Engage with skilled PhDs transitioning to careers in Data Science, ML, AI, Deep Learning, UX Research, and more. Each cohort the Erdős Institute attracts over 500 PhDs from some of the world's top universities.

     

  • Flexible Mentorship Options
    You can provide a mentor from your team or we’ll assign one from our side.

     

  • Multiple Teams for Maximum Impact
    Each $5,000 sponsorship covers up to 4 teams working on your project.

     

  • End-to-End Deliverables
    Get access to final project videos, GitHub repos, executive summaries, and more.

COHORT SCHEDULE

📅   2025 Boot Camps & Deadlines

Cohort

Cohort Start Date

Submit Project By *

Spring 2025

January 22, 2025

January 15, 2025

Summer 2025

May 7, 2025

April 30, 2025

Fall 2025

September 10, 2025

September 3, 2025

* Projects should be scoped and submitted before the cohort start date. We recommend submitting at least 2 weeks in advance to ensure alignment and onboarding.

PAST PROJECT SPOTLIGHTS

Examples of projects from prior cohorts

FALL 2025

Quant Finance Boot Camp

Implied Volatility vs. Realized Volatility for an Africa-Exposure ETF

Implied Volatility vs. Realized Volatility for an Africa-Exposure ETF
github URL

Chidubem Umeh

As a Nigerian-American, I have a personal interest in understanding African financial markets—particularly those in West Africa, where local economic factors often differ significantly from continental trends.

This project will focus on volatility modeling and forecasting using real market data. The “West Africa Regime” component refers specifically to periods of heightened Nigerian Naira (NGN) volatility, which can be used as a proxy for broader West African macroeconomic uncertainty.

FALL 2025

Deep Learning Boot Camp

Deep Learning Models for Colorectal Polyp Detection

Deep Learning Models for Colorectal Polyp Detection
github URL

Ruibo Zhang, Rebekah Eichberg, Betul Senay Aras, Kevin Specht, Arthur Diep-Nguyen

A polyp is an abnormal tissue growth in the large intestine that is typically benign but can develop into malignant colorectal cancer. Colonoscopy enables endoscopists to identify and assess these polyps for potential removal. However, the accuracy of this procedure depends heavily on the clinician’s expertise, making it prone to human error and variability. Our goal is to build a deep-learning model that detects colorectal polyps in images from colonoscopies to minimize missed lesions and improve patient outcomes.

FALL 2025

Data Science Boot Camp

Identifying Early Risk Factors for Students in Online Courses

Identifying Early Risk Factors for Students in Online Courses
github URL

James McNally,James Caramanico,Arina Favilla,Feng Zhu

Research Question: What early engagement patterns in virtual learning environments predict negative course outcomes?

Context: It is well known that performance on assessments and in-class attendance are predictive of final course results. Yet grades often come too late in a class term for early interventions and attendance is difficult to measure in online learning environments. To address this gap, we developed a model for identifying early risk factors in online courses based on student interaction patterns in a virtual learning environment (VLE).

Data source: Open University Learning Analysis Dataset (OULAD), which includes daily logs of UK student VLE interactions and grades in 7 science and social science online courses occurring in 2013-14.

Goal: Develop a model for identifying early risk factors based on student interaction patterns that predict negative course outcomes (i.e., failure or withdrawal) in a VLE.

FALL 2025

Data Science Boot Camp

Personalized Gesture Recognition

Personalized Gesture Recognition
github URL

Sero Parel, Carrie Clark, Brian Mullen, Philip Nelson, Revati Jadhav

Smart wristbands enable users to control technology through subtle hand gestures by decoding muscle signals. However, each individual's muscle signals are unique, making personalization a critical challenge. Leveraging a publicly available dataset (Kaifosh et al. 2025), our team developed a personalized gesture recognition model using surface electromyography (sEMG) data from 100 participants performing 9 gestures. Addressing inter-user variability through within-user training, we engineered 160 features and selected 37 via random forest ranking and correlation pruning. Logistic regression with L2 regularization achieved strong cross-validation performance (F1 Macro = 0.7164), but holdout testing revealed a generalization gap (F1 Macro = 0.3977). Performance varied widely, confirming heterogeneity in performance across diverse users. Future work could explore adaptive time windows and fine-tuning pre-trained models to enable more robust commercial neuromotor interfaces.

FALL 2025

Data Science Boot Camp

Predicting Lead Contamination in NY School Drinking Water

Predicting Lead Contamination in NY School Drinking Water
github URL

Ranadeep Roy,Cami Goray,Hana Lang

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.

SUMMER 2025

Deep Learning Boot Camp

Going Off-Grid: A Computer Vision Approach for Grid Integration and Reconstruction in Post-War Syria

Going Off-Grid: A Computer Vision Approach for Grid Integration and Reconstruction in Post-War Syria
github URL

Al Baraa Abd Aldaim,Suman Bhandari,Nicholas Geiser

Decentralized solar electricity production has become common in Syria due to unreliability and inconsistent delivery from the national electric grid. Estimating output from decentralized solar is vital for grid integration and reconstruction efforts currently underway in Syria. Our goal in this project was to develop a deep learning model capable of detecting panels (bounding box), estimating their area (segmentation), and predicting the bottom corners of the panel assembly (corner prediction).

SUMMER 2025

Deep Learning Boot Camp

Fraud Detection with Deep Learning

Fraud Detection with Deep Learning
github URL

Jude Pereira, Yang Yang, Adrian Wong, Sara Edelman-Munoz, Mary Reith

Fraud detection is a critical area where deep learning has been effectively applied to identify and prevent unauthorized transactions, money laundering, and other financial crimes. Traditional rule-based systems and statistical models often struggle to detect sophisticated fraud patterns, particularly when dealing with large volumes of data and rapidly evolving fraud techniques. In contrast, deep learning models, such as CNNs, RNNs, and autoencoders, have proven highly effective in analyzing complex, high-dimensional transaction data and detecting subtle, non-linear patterns indicative of fraudulent activity.
In this project, we build a User ID-based fraud detection model using autoencoders, trained on unlabelled real-world credit card transaction data, capable of detecting fraud with a precision of up to 35% and a recall of up to 72%, performing significantly better than traditional ML/statistical baseline models..

SUMMER 2025

Deep Learning Boot Camp

FSP Finder

FSP Finder
github URL

Duncan Clark,Elzbieta Polak,Jared Able,Shuo Yan

NOTE: Available at www.fspfinder.com (HF link is deactivated)

FSP (Foul Speech Pattern) Finder is a useful tool for preparing music files for radio airplay by detecting and automatically censoring explicit content. We use a custom version of OpenAI's automatic speech recognition model Whisper, which we fine tuned on over 100 hours of music vocals, to transcribe uploaded music files (with timestamps for each word). We then search for common explicit words (e.g., curse words, racial slurs, etc.) in the transcript. The vocals stem of the track is separated using demucs, then muted at the identified times, to produce a high quality radio-friendly version of the uploaded track(s).

Our tool comes with an easy to use web interface built in Gradio. The tool can process files one at a time or in batches, and the web interface allows the user to view the full transcript of each track along with the words that will be censored, before downloading the edited files.

SUMMER 2025

Data Science Boot Camp

Machine Learning Magnetism

Machine Learning Magnetism
github URL

Ahmed Abdelazim, Murod Mirzhalilov, Brandon Abrego, Sayok Chakravarty

Strong electron correlations often lead to emergent magnetic behavior in materials. Predicting such magnetic properties is essential for advancing technologies in spintronics, data storage, and quantum computing. However, traditional methods - whether experimental techniques or density functional theory (DFT) calculations - are often complex, time-consuming, or unreliable in strongly correlated systems. This project aims at building machine learning models to predict the magnetic ordering of inorganic compounds using chemical, structural, electronic, and thermodynamic descriptors. By leveraging existing materials databases (The Materials Project + Bilbao Crystallographic Server MAGNDATA), our goal is to build a ML model that offers a faster, data-driven alternative for accelerating the discovery and design of novel magnetic materials. Our results represent a step forward in tackling the grand challenge of magnetism.

SUMMER 2025

Data Science Boot Camp

WikiShield: Guarding against vandalism on Wikipedia

WikiShield: Guarding against vandalism on Wikipedia
github URL

Samarth Chawla, Daniel Milanes Perez, Paul Spears, Zijian Rong, Zihao Fang

Despite being open for editing by anyone, Wikipedia tends to be fairly reliable for a first pass on many topics. Its openness to editors is its greatest strength, but also its greatest vulnerability. Intentionally disruptive and malicious edits can be a nuisance for unsuspecting readers who may come across nonsensical sentences inserted by bad actors. These edits can also pollute downstream platforms (such as search engines) that may rely on Wikipedia to generate short summaries of relevant information.

Reverting these edits often falls to volunteer (human) editors. The aim of our project, "WikiShield," was to produce a machine learning model designed to quickly detect vandalism edits on Wikipedia so that the effort of human volunteer editors is not wasted reverting low-effort attempts at vandalism.

SUMMER 2025

Data Science Boot Camp

Predicting Yearly Science Fiction/Fantasy Awards

Predicting Yearly Science Fiction/Fantasy Awards
github URL

Zach Raines, Rohan Nair

There are a number of major awards are given to the ‘best’ new Science and Fiction fantasy novels each year, such as the Hugo, Nebula, and World Fantasy awards. Predicting which books might win is made especially difficult by the paucity of publicly available sales and review data as a function of time.

In this project we constructed a predictive model to select award winners from a pool of nominees based on publicly available information about the books and a proxy topicality score describing world state and zeitgeist for the year of publication.

SUMMER 2025

Data Science Boot Camp

Safeify: A Quality and Safety Metric

Safeify: A Quality and Safety Metric
github URL

Emelie Arvidsson, Alex Margolis, Rebekah Eichberg, Betul Senay Aras

The Safeify project was motivated by the need to find a better quality and safety metric for online consumers as product ratings are not that reliable due to bots and paid reviewers. Also, there is no known metric or model that flags safety concerns such as recalls and multiple incident reports of products. The goal of the Safeify model is to help consumers by predicting unsafe and poor quality products that could lead to dissatisfaction.

SUMMER 2025

Data Science Boot Camp

Tuning Up Music Highway

Tuning Up Music Highway
github URL

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

SPRING 2025

Deep Learning Boot Camp

Deep Learning - Audio Project (VocalCycleGAN)

Deep Learning - Audio Project (VocalCycleGAN)
github URL

Gregory Taylor,Jaspar Wiart,Chutian Ma

In this project, trained a cycleGAN on speech data and singing data to create a voice synthesizer that takes speech and outputs a synthesized voice to play over a given song.

SPRING 2025

Data Science Boot Camp

Today's Texas Might be Tomorrow's Ohio: Building a Geographic Climate Change Predictor

Today's Texas Might be Tomorrow's Ohio: Building a Geographic Climate Change Predictor
github URL

David Pochik, Alison Duck, Tawny Sit, Jack Neustadt

From the dawn of industrialization to today, the average global temperature has shifted upward by ~2.7 degrees Fahrenheit (~1.5 degrees Celsius) due to increased greenhouse gas emissions. If emissions are left unchecked and temperatures continue to rise at their current (or projected) rate, then this will lead to drastic shifts in regional climate. For example, today's annual average temperature in Ohio will increase to that of today's annual temperature in Texas in Y years.

This project explores and analyzes geographical climate change data in the contiguous United States from 1950 to the current year. The objective is to predict regional features, e.g., temperature, precipitation, snowfall, etc., for a given year based on historical data, i.e., if I want to live in an area Y years from now that has roughly the same temperature or climate as region X today, where would I go?

SPRING 2025

Data Science Boot Camp

Who Regulates the Regulators?

Who Regulates the Regulators?
github URL

Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser

With recent major cuts to governmental regulation agencies in the US, we investigate whether those cuts are justified. In particular, we analyze the efficacy of RGGI, a state-level cap-and-trade program designed to regulate CO2 emissions in power plants. By using synthetic controls, we answer the counterfactual question: "how would CO2 emissions look in a world where RGGI was never enacted?".

SPRING 2025

Data Science Boot Camp

Discovering Next-Gen Battery Materials

Discovering Next-Gen Battery Materials
github URL

Dorisa Tabaku, Avinash Karamchandani, Qinying Chen, Sadisha Nanayakkara, FNU Simran

Building the next generation of batteries—efficient, compact, and sustainable—relies on discovering new materials with the right set of properties. Metal-organic frameworks (MOFs), a class of crystalline and porous materials, have emerged as promising candidates for battery electrodes due to their potential for electrical conductivity. One key property that influences a MOF’s conductivity is its band gap. However, state-of-the-art density functional theory (DFT) calculations used to compute band gaps are computationally expensive. In this project, our goal was to develop a machine learning model to predict the band gaps of MOFs, helping to rapidly identify promising candidates for future energy storage technologies such as next-generation batteries.

SPRING 2025

Data Science Boot Camp

Predicting Power Outages

Predicting Power Outages
github URL

Aaron Weinberg, Evan Morris, Anna Zuckerman, Julio Caceres Gonzales

From ThinkOnward (https://thinkonward.com/app/c/challenges/dynamic-rhythms):

"In this challenge, you'll be tasked with developing a model to predict power outages and how they correlate with extreme, rare weather events (e.g. storms). Your goal is to create a reliable system that can accurately predict these outages. You'll have access to a dataset containing historical weather data and relevant power outages. Your task is to design a model that can effectively forecast future weather impacts on power outages. You're free to explore and experiment with various algorithms, techniques, and models to achieve accurate results. To make things more interesting, we've identified two primary datasets: a storm event dataset and a power outage dataset. These dual datasets will require you to develop a robust model that can adapt to different scenarios and provide accurate forecasts."

SPRING 2025

Data Science Boot Camp

Predicting Survival Time After Bone Marrow Transplant

Predicting Survival Time After Bone Marrow Transplant
github URL

Ruibo Zhang, Chi-Hao Wu, Yang Li, Ray Karpman, Elzbieta Polak

A blood and marrow transplant is a procedure that replaces unhealthy blood-forming cells with healthy ones. It typically involves using blood-forming cells donated by someone else instead of one's own blood-forming cells. The goal of this project is to predict transplant survival rates for post Bone Marrow Transplant patients.

We implemented and finetuned four models including Cox Proportional Hazard Model, XGBoost AFT, Survival Random Forest, and CatBoost AFT. To improve model performance, we hybridized each of the four models with an extra logistic or random forest stratification.

Our dataset comes from a Kaggle competition: https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/.

FALL 2024

Data Science Boot Camp

Predicting Problematic Internet Use

Predicting Problematic Internet Use
github URL

Daniel Visscher, Emilie Wiesner, Aaron Weinberg

Internet use has been identified by researchers as having the potential to rise to the level of addiction, with associated increased rates of anxiety and depression. Identifying cases of problematic internet usage currently requires evaluation by an expert, however, which is a significant impediment to screening children and adolescents across society. One potential solution is to rely on data that is more easily and uniformly collected: the kind collected by a family physician, a simple survey, or by a smartwatch. The research question this project sets out to answer is: “Can we predict the level of problematic internet usage exhibited by children and adolescents, based on their physical activity and survey responses?”

SUBMIT A CHALLENGE

Provide us with some initial information about your project idea. It doesn't need to be complete. Our technical team will contact you after you submit your project challenge to scope out the project details and align it with an upcoming cohort.

Are you providing data as part of this challenge?
BOOK A CALL

If you would like to discuss project sponsorship options before submitting a challenge, then please fill out the form below to schedule a zoom call with out technical team.

©2017-2025 by The Erdős Institute.

bottom of page