PARTICIPANT DEMOGRAPHICS

4000+ participants with PhDs from 300+ universities

Candidate
Profiles

5047

Seeking Internships

1218

Seeking
Part-Time

739 Seeking
Full-Time

1938

Seeking
Senior/Managerial

396 Seeking
DS, ML, AI

2282

Seeking Quant
Research/Finance

1493

Seeking Software Engineering

820 Seeking Quantum Computing

594 Seeking
UX Research

437 Seeking
Prof/Sci Writing

612

EDUCATION & REPORTED DEMOGRAPHIC DATA

University

Area of Study

Year of PhD

Reported Ethnicity

Reported Gender

Reported First Gen

PARTICIPANT PROJECTS

Examples of projects from prior cohorts

SUMMER 2025

TEAM

Data Science Boot Camp

Machine Learning Magnetism

Ahmed Abdelazim, Murod Mirzhalilov, Brandon Abrego, Sayok Chakravarty

Strong electron correlations often lead to emergent magnetic behavior in materials. Predicting such magnetic properties is essential for advancing technologies in spintronics, data storage, and quantum computing. However, traditional methods - whether experimental techniques or density functional theory (DFT) calculations - are often complex, time-consuming, or unreliable in strongly correlated systems. This project aims at building machine learning models to predict the magnetic ordering of inorganic compounds using chemical, structural, electronic, and thermodynamic descriptors. By leveraging existing materials databases (The Materials Project + Bilbao Crystallographic Server MAGNDATA), our goal is to build a ML model that offers a faster, data-driven alternative for accelerating the discovery and design of novel magnetic materials. Our results represent a step forward in tackling the grand challenge of magnetism.

SUMMER 2025

TEAM

Data Science Boot Camp

WikiShield: Guarding against vandalism on Wikipedia

Samarth Chawla, Daniel Milanes Perez, Paul Spears, Zijian Rong, Zihao Fang

Despite being open for editing by anyone, Wikipedia tends to be fairly reliable for a first pass on many topics. Its openness to editors is its greatest strength, but also its greatest vulnerability. Intentionally disruptive and malicious edits can be a nuisance for unsuspecting readers who may come across nonsensical sentences inserted by bad actors. These edits can also pollute downstream platforms (such as search engines) that may rely on Wikipedia to generate short summaries of relevant information.

Reverting these edits often falls to volunteer (human) editors. The aim of our project, "WikiShield," was to produce a machine learning model designed to quickly detect vandalism edits on Wikipedia so that the effort of human volunteer editors is not wasted reverting low-effort attempts at vandalism.

SUMMER 2025

TEAM

Data Science Boot Camp

Predicting Yearly Science Fiction/Fantasy Awards

Zach Raines, Rohan Nair

There are a number of major awards are given to the ‘best’ new Science and Fiction fantasy novels each year, such as the Hugo, Nebula, and World Fantasy awards. Predicting which books might win is made especially difficult by the paucity of publicly available sales and review data as a function of time.

In this project we constructed a predictive model to select award winners from a pool of nominees based on publicly available information about the books and a proxy topicality score describing world state and zeitgeist for the year of publication.

SUMMER 2025

TEAM

Data Science Boot Camp

Safeify: A Quality and Safety Metric

Emelie Arvidsson, Alex Margolis, Rebekah Eichberg, Betul Senay Aras

The Safeify project was motivated by the need to find a better quality and safety metric for online consumers as product ratings are not that reliable due to bots and paid reviewers. Also, there is no known metric or model that flags safety concerns such as recalls and multiple incident reports of products. The goal of the Safeify model is to help consumers by predicting unsafe and poor quality products that could lead to dissatisfaction.

SUMMER 2025

TEAM

Data Science Boot Camp

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

SPRING 2025

TEAM

Deep Learning Boot Camp

Deep Learning - Audio Project (VocalCycleGAN)

Gregory Taylor,Jaspar Wiart,Chutian Ma

In this project, trained a cycleGAN on speech data and singing data to create a voice synthesizer that takes speech and outputs a synthesized voice to play over a given song.

SPRING 2025

TEAM

Data Science Boot Camp

Today's Texas Might be Tomorrow's Ohio: Building a Geographic Climate Change Predictor

David Pochik, Alison Duck, Tawny Sit, Jack Neustadt

From the dawn of industrialization to today, the average global temperature has shifted upward by ~2.7 degrees Fahrenheit (~1.5 degrees Celsius) due to increased greenhouse gas emissions. If emissions are left unchecked and temperatures continue to rise at their current (or projected) rate, then this will lead to drastic shifts in regional climate. For example, today's annual average temperature in Ohio will increase to that of today's annual temperature in Texas in Y years.

This project explores and analyzes geographical climate change data in the contiguous United States from 1950 to the current year. The objective is to predict regional features, e.g., temperature, precipitation, snowfall, etc., for a given year based on historical data, i.e., if I want to live in an area Y years from now that has roughly the same temperature or climate as region X today, where would I go?

SPRING 2025

TEAM

Data Science Boot Camp

Who Regulates the Regulators?

Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser

With recent major cuts to governmental regulation agencies in the US, we investigate whether those cuts are justified. In particular, we analyze the efficacy of RGGI, a state-level cap-and-trade program designed to regulate CO2 emissions in power plants. By using synthetic controls, we answer the counterfactual question: "how would CO2 emissions look in a world where RGGI was never enacted?".

SPRING 2025

TEAM

Data Science Boot Camp

Discovering Next-Gen Battery Materials

Dorisa Tabaku, Avinash Karamchandani, Qinying Chen, Sadisha Nanayakkara, FNU Simran

Building the next generation of batteries—efficient, compact, and sustainable—relies on discovering new materials with the right set of properties. Metal-organic frameworks (MOFs), a class of crystalline and porous materials, have emerged as promising candidates for battery electrodes due to their potential for electrical conductivity. One key property that influences a MOF’s conductivity is its band gap. However, state-of-the-art density functional theory (DFT) calculations used to compute band gaps are computationally expensive. In this project, our goal was to develop a machine learning model to predict the band gaps of MOFs, helping to rapidly identify promising candidates for future energy storage technologies such as next-generation batteries.

SPRING 2025

TEAM

Data Science Boot Camp

Predicting Power Outages

Aaron Weinberg, Evan Morris, Anna Zuckerman, Julio Caceres Gonzales

From ThinkOnward (https://thinkonward.com/app/c/challenges/dynamic-rhythms):

"In this challenge, you'll be tasked with developing a model to predict power outages and how they correlate with extreme, rare weather events (e.g. storms). Your goal is to create a reliable system that can accurately predict these outages. You'll have access to a dataset containing historical weather data and relevant power outages. Your task is to design a model that can effectively forecast future weather impacts on power outages. You're free to explore and experiment with various algorithms, techniques, and models to achieve accurate results. To make things more interesting, we've identified two primary datasets: a storm event dataset and a power outage dataset. These dual datasets will require you to develop a robust model that can adapt to different scenarios and provide accurate forecasts."

SPRING 2025

TEAM

Data Science Boot Camp

Predicting Survival Time After Bone Marrow Transplant

Ruibo Zhang, Chi-Hao Wu, Yang Li, Ray Karpman, Elzbieta Polak

A blood and marrow transplant is a procedure that replaces unhealthy blood-forming cells with healthy ones. It typically involves using blood-forming cells donated by someone else instead of one's own blood-forming cells. The goal of this project is to predict transplant survival rates for post Bone Marrow Transplant patients.

We implemented and finetuned four models including Cox Proportional Hazard Model, XGBoost AFT, Survival Random Forest, and CatBoost AFT. To improve model performance, we hybridized each of the four models with an extra logistic or random forest stratification.

Our dataset comes from a Kaggle competition: https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/.

FALL 2024

TEAM

Data Science Boot Camp

Predicting Problematic Internet Use

Daniel Visscher, Emilie Wiesner, Aaron Weinberg

Internet use has been identified by researchers as having the potential to rise to the level of addiction, with associated increased rates of anxiety and depression. Identifying cases of problematic internet usage currently requires evaluation by an expert, however, which is a significant impediment to screening children and adolescents across society. One potential solution is to rely on data that is more easily and uniformly collected: the kind collected by a family physician, a simple survey, or by a smartwatch. The research question this project sets out to answer is: “Can we predict the level of problematic internet usage exhibited by children and adolescents, based on their physical activity and survey responses?”

FALL 2024

TEAM

Data Science Boot Camp

AP Outcomes to university metrics

Shannon McElhenney, Raymond Tana, Shrabana Hazra, Prabhat Devkota, Jung-Tsung Li

This project was designed to investigate the potential relationship between AP exam performance and the presence of nearby universities. It was initially hypothesized that local (especially R1/R2 or public) universities would contribute to better pass rates for AP exams in their vicinities as a result of their various outreach, dual-enrollment, tutoring, and similar programs for high schoolers. We produce a predictive model that uses a few features related to university presence, personal income, and population to predict AP exam performance.

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

Wunderpus Octopus (New Atlantis)

Ingrida Semenec, Kshitiz Parihar, Nadir Hajouji, Saswat Mishra, Deniz Olgu Devecioglu

Modeling the relationship between biogeochemical layers and chlorophyll density
The distribution and density of chlorophyll in the ocean are critical indicators of marine primary productivity, which influences the global carbon cycle, marine food webs, and climate regulation. Biogeochemical and physical ocean properties, including nutrient availability, light penetration, water temperature, salinity, and ocean currents influence chlorophyll density. Understanding and accurately modeling these relationships is essential for predicting the impacts of environmental changes on marine ecosystems and for managing oceanic resources effectively. We plan to combine multiple Copernicus Marine Datasets to model the chlorophyll density based on the biochemical and physical properties of the ocean.

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

RivusVox Editor

Zachary Bezemek,Francesca Balestrieri

RivusVox Editor: the world's first near-live zero-shot adaptive speech editing system

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

A Vocal-Cue Interpreter for Minimally-Verbal Individuals

Julian Rosen, Alessandro Malusà, Rahul Krishna, Atharva Patil, Monalisa Dutta, Sarasi Jayasekara

The ReCANVo dataset consists of ~7k audio recordings of vocalizations from 8 minimally-verbal individuals (mostly people with developmental disabilities). The recordings were made in a real-world setting, and were categorized on the spot by the speaker's caregiver based on context, non-verbal cues, and familiarity with the speaker. There are several pre-defined categories such as selftalk, frustrated, delighted, request, etc., and caregivers could also specify custom categories. Our goal was to train a model, per individual, that accurately predicts labels and improves upon previous work.

We train several different combinations of models of the form “Feature Extractor + Classifier”. For extracting features from audio data, we use two deep models (HuBERT and AST) each with pre-trained weights, as well as mel spectrograms. As classifiers, we use a 4 layer CNN-based neural network (for mel spectrograms), NNs with fully-connected layers (for features coming from deep models), and more.

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

“Good composers borrow, Great ones steal!”

Emelie Curl, Tong Shan, Glenn Young, Larsen Linov, Reginald Bain

Throughout history, composers and musicians have borrowed musical elements like chord progressions, rhythms, lyrics, and melodies from each other. Our motivation for this project is born of a fascination with this phenomenon, which of course extends to less legal examples like unconsciously or intentionally copying the work of another. Even famed and highly regarded composers like Bach, Vivaldi, Mozart, and Haydn are not innocent of borrowing from their contemporaries or even recycling their own works. Similarly, in 2015, in a high-profile court case, defendants and artists Robin Thicke and Pharrell Williams were ordered to pay millions of dollars in damages for copyright infringement to Marvin Gaye's estate, considering they borrowed from Gaye’s "Got to Give it Up" when writing their hit "Blurred Lines." Our project aimed to use deep learning to assess the similarity between musical clips to potentially establish a more robust and empirical way to detect music plagiarism.

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

Taxi Demand Forecasting

Ngoc Nguyen, Li Meng, Sriram Raghunath, Nazanin Komeilizadeh, Noah Gillespie, Edward Ramirez

Knowing where to go to find customers is the most important question for taxi drivers and ride hailing networks. If demand for taxis can be reliably predicted in real-time, taxi companies can dispatch drivers in a timely manner and drivers can optimize their route decision to maximize their earnings in a given day. Consequently, customers will likely receive more reliable service with shorter wait time. This project aims to use rich trip-level data from the NYC Taxi and Limousine Commission to construct time-series taxi rides data for 63 taxi zones in Manhattan and forecast demand for rides. We will explore deep learning models for time series, including Multilayer Perceptrons, LSTM, Temporal Graph-based Neural Networks, and compare them with a baseline statistical model ARIMAX.

MAY-SUMMER 2024

TEAM

Deep Learning Boot Camp

arXiv Chatbot

Xiaoyu Wang,Ketan Sand,Guoqing Zhang,Tajudeen Mamadou Yacoubou,Tantrik Mukerji

arXiv is the largest open database available containing nearly 2.4 million research papers, spanning 8 major domains covering everything there is to understand from the tiniest of atoms to the entire cosmos. A large language model (LLM) having access to such a dataset will make it unprecedented in generating updated, relevant, and, more importantly, precise information with citable sources.

This is exactly what we have done in this project. We have refined the capabilities of Google’s Gemini 1.5 pro LLM by building a customized Retrieval-Augmented Generation (RAG) pipeline that has access to the entire arXiv database. We then deployed the entire package into an app that mimics a chatbot to make the experience user-friendly.

MAY-SUMMER 2024

TEAM

Data Science Boot Camp

Continuous Glucose Monitoring

Daniel Visscher,Margaret Swerdloff,Noah Gillespie,S. C. Park,oladimeji olaluwoye

The idea of the project is to predict high glucose spikes from continuous glucose data, smartwatch data, food logs, and glycemic index. The dataset consists of the following:
1) Tri-axial accelerometer data (movement in subject)
2) Blood volume pulse
3) Intestinal glucose concentration
4) Electrodermal activity
5) Heart rate
6) IBI (interbeat interval)
7) Skin temperature
8) Food log
Data is public in: https://physionet.org/content/big-ideas-glycemic-wearable/1.1.2/#files-panel

MAY-SUMMER 2024

TEAM

Data Science Boot Camp

Chirp Checker

Andrew Merwin, Caleb Fong, B Mede, Yang Yang, Robert Cass, Calvin Yost-Wolff

The nocturnal soundscapes of late summer and autumn are replete with the familiar chirps, trills, and buzzes of singing insects. But these cryptic performers often remain anonymous and underappreciated.

The goal of this project was to build machine learning models to identify the presence of insects in sound files and to coarsely categorize the sounds as crickets, katydids, or cicadas.

Both Support Vector Classifiers and Convolutional Neural Networks were able to identify insects songs to the broad categories of cricket, katydid, and cicada with 90% accuracy or higher.

In the future, similar, more sophisticated models could be applied to filtering large volumes of passively recorded audio from ecological studies of insects and could power apps that identify insect songs to the species level.

SPRING 2024

TEAM

Data Science Boot Camp

Aware NLP Project III

Mohammad Nooranidoost, Baian Liu, Craig Franze, Mustafa Anıl Tokmak, Himanshu Raj, Peter Williams

This project involves the investigation and evaluation of different methodologies for retrieval for use in RAG (Retrieval-Augmented Generation) systems. In particular, this project investigates retrieval quality for information downloaded from employee subreddits. We investigated the impacts of using clustering, multi-vector indexing, and multi-querying in advanced retrieval methodologies against baseline naive retrieval.

FALL 2023

TEAM

Data Science Boot Camp

Groundwater Forecasting

Riti Bahl, Meredith Sargent, Marcos Ortiz, Chelsea Gary, Anireju Dudun

Groundwater is a critical source of water human survival. A significant percentage of both drinking and crop irrigation water is drawn from groundwater sources through wells. In the US, overuse of groundwater could have major implications for the future and forecasting groundwater can be useful in understanding its impact. Building on historical data for four wells, together with surface water and weather data, in Spokane, WA, we construct and evaluate machine learning models that forecast groundwater levels in the area.

FALL 2023

TEAM

Data Science Boot Camp

Funk

aydin ozbek, Dane Miyata, Kristina Knowles, Mario Gomez, Kashish Mehta

Most existing music recommendation systems rely on listeners to provide seed tracks, and then utilize a variety of different approaches to recommend additional tracks in either a playlist-like listening session or as sequential track recommendations based on user feedback.

We built a playlist recommendation engine that takes a different approach, allowing listeners to generate a novel playlist based on a semantic string, such as the title of desired playlist, specific mood (happy, relaxed), atmosphere (tropical vibe), or function (party music, focus). Using a publicly available dataset of existing playlists (https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge), we combine a semantic similarity vector model with a matrix factorization model to allow users to quickly and easily generate playlists to fit any occasion.

FALL 2023

TEAM

Data Science Boot Camp

The Silent Emergency - Predicting Preterm Birth

Katherine Grillaert, Divya Joshi, Alexander Sutherland, Kristina Zvolanek, Noah Rahman

Preterm birth is a primary cause of infant mortality and morbidity in the United States, affecting approximately 1 in 10 births. The rates are notably higher among Black women (14.6%), compared to White (9.4%) and Hispanic women (10.1%). Despite its prevalence, predicting preterm birth remains challenging due to its multifaceted etiology rooted in environmental, biological, genetic, and behavioral interactions. Our project harnesses machine learning techniques to predict preterm birth using electronic health records. This data intersects with social determinants of health, reflecting some of the interactions contributing to preterm birth. Recognizing that under-representation in healthcare research perpetuates racial and ethnic health disparities, we take care to use diverse data to ensure equitable model performance across underrepresented populations.

FALL 2023

TEAM

Data Science Boot Camp

DDTs: Dementia Detection Tool

Himanshu Khanchandani, Clark Butler, Cisil Karaguzel, Selman Ipek, Shreya Shukla

Alzheimer’s disease (AD) is one of the most common types of dementia and frequently affects the elderly. Electroencephalography (EEG) is a non-invasive technique to measure the brain activity using external electrodes and may help provide improved diagnosis of AD. In this project we use power spectrum of EEG to build a robust machine learning classifier which predicts whether a patient has Alzheimer's or is healthy. We vastly improve upon existing models in the literature by using modified features compared to the ones used in literature.

SPRING 2023

TEAM

Data Science Boot Camp

Correcting Racial Bias in Measurement of Blood Oxygen Saturation

Rohan Myers, Saad Khalid, woojeong kim, Brooks Miner, Jaychandran Padayasi

Fingertip pulse oximeters are the current standard for estimating blood oxygen saturation without a blood draw, both at home and in healthcare settings. However, pulse oximeters overestimate oxygen saturation, often resulting in ‘hidden hypoxemia’: a patient has hypoxemia (dangerously low oxygen saturation), but the oximeter returns a healthy oxygen value. Unfortunately, oximeter overestimation of oxygen saturation is exacerbated for patients with darker skin tones due to light-based oximeter technology. This results in Black patients experiencing hidden hypoxemia at twice the rate of white patients. By combining pulse oximeter readings (SpO2) with additional patient data, we develop improved methods for estimating arterial blood oxygen saturation (SaO2) and identifying Hidden Hypoxemia. The predictions of our models are more accurate than pulse-oximeter readings alone, and remove the systematic racial inequity inherent in the current medical practice of using oximeter readings alone.

SPRING 2021

TEAM

Data Science Boot Camp

MaizeFinder

Tuguldur Sukhbold, Michael Darcy, Pol Arranz-Gibert, AJ Adejare

Our problem is to accurately predict maize field centers in Africa using very low resolution satellite images. The dataset contains many disparate entries including two satellite imagery, one with higher spatial res (Planet) and one with higher temporal resolution and wavelength coverage (Sentinel-2), partial metadata about the crop fields including estimated yield, size, and subjective quality of the measurement. We are employing CNN based image segmentation models to compute displacement vector from the image center.

SPRING 2021

TEAM

Data Science Boot Camp

NLPs

Frank Hidalgo, Joseph Szabo, Christopher Zhang, Sean Perez, Kun Jin

Acronym/Abbreviation (short form) disambiguation is one of main challenges when using NLP methods to uderstand medical records. While this topic has long been studied, it is still a work in progress. Current strategies often involve having manually curated datasets of abbreviations and train classifiers. The main problem of that approach is that curated datasets are sparse and don't include all the short forms. In Dec 2020, a paper came out where they created a large dataset of short forms as one of their steps in their pipeline to pre-train models. The goal of our project would be to build upon their short form disambiguation piece and create a tool to disambiguate a medical short form using its context. Example of the usage of our tool: original_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, AB 2. She has no history of adverse reaction to anesthesia." AB could stand for "abortion", "ankle-brachial", "blood group in ABO system", "A, B lines in Kerley lines". disambiguated_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, abortion 2. She has no history of adverse reaction to anesthesia."

FALL 2022

TEAM

Data Science Boot Camp

Lime

Yuchen Luo, Ritika Khurana, Aditya Chander, Taylor Mahler

We built a podcast recommendation engine that suggests episodes to a listener based on either a previous episode that they've heard or an episode description that they can input with freeform text entry.

SPRING 2022

TEAM

Data Science Boot Camp

Supermassive Black Hole

Anna Brosowsky, Sayantan Khan, Nancy Wang, Ethan Zell, Yili Zhang

We built a movie finder app that allows a user to enter some details they remember about a movie (along with some optional filter info on the genre and release year) and then predicts what movie the user is thinking of. To solve this NLP problem, our tool uses an embed-and-rerank model. We have precomputed vectorizations of movie plot information for the approximately 34,000 movies in our dataset.

Our model’s first step is to vectorize the user’s query and do a fast comparison to find the 100 closest plot vectors. Then it reranks these top 100 closest plots, performing a more thorough comparison using a neural network that semantically compares the plot fragments with the original query. Finally, we output the 10 movies which show up at the top of this new ranking.

SPRING 2022

TEAM

Data Science Boot Camp

Erdio

Matthew Frick, Paul Jreidini, Matthew Heffernan

Timely identification of safety-critical events, such as gunshots, is of great importance to public safety stakeholders. However, existing systems only deliver limited value by not classifying additional urban sounds. We perform classification of environmental sounds to detect safety-critical events, in particular gunshots, and provide information on first-response via siren detection. We also engineer general features for off-line classification tasks and demonstrate how this system can provide value to additional stakeholders in the film and television industry.

SPRING 2022

TEAM

Data Science Boot Camp

SKYLAB

Chenyi Gu, Briana Stanfield, Dylan Bates, Kanishk Jain

The NHL Stanley Cup is the oldest existing trophy to be awarded to a professional sports franchise in North America, and often considered “the hardest trophy to win in professional sport.” Using just regular season data, we want to know, can we predict who is going to win the Stanley Cup?
We collected data from each team, as well as data from every player in over 20,000 games going back to 2005. Using this data, we made an ensemble model using logistic regression, AdaBoost, random forests, and a neural network, which were able to predict playoff data with up to 70% accuracy - above the theoretical threshold reported in the literature of 62%.

Join Us

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

PARTICIPANT DEMOGRAPHICS

4000+ participants with PhDs from 300+ universities

Candidate Profiles

5047

Seeking Internships

1218

Seeking Part-Time

739

Seeking Full-Time

1938

Seeking Senior/Managerial

396

Seeking DS, ML, AI

2282

Seeking Quant Research/Finance

1493

Seeking Software Engineering

820

Seeking Quantum Computing

594

Seeking UX Research

437

Seeking Prof/Sci Writing

612

EDUCATION & REPORTED DEMOGRAPHIC DATA

PARTICIPANT PROJECTS

Examples of projects from prior cohorts

SUMMER 2025

TEAM

Data Science Boot Camp

Machine Learning Magnetism

Ahmed Abdelazim, Murod Mirzhalilov, Brandon Abrego, Sayok Chakravarty

SUMMER 2025

TEAM

Data Science Boot Camp

WikiShield: Guarding against vandalism on Wikipedia

Samarth Chawla, Daniel Milanes Perez, Paul Spears, Zijian Rong, Zihao Fang

SUMMER 2025

TEAM

Data Science Boot Camp

Predicting Yearly Science Fiction/Fantasy Awards

Zach Raines, Rohan Nair

SUMMER 2025

TEAM

Data Science Boot Camp

Safeify: A Quality and Safety Metric

Emelie Arvidsson, Alex Margolis, Rebekah Eichberg, Betul Senay Aras

SUMMER 2025

TEAM

Data Science Boot Camp

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

SPRING 2025

TEAM

Deep Learning Boot Camp

Deep Learning - Audio Project (VocalCycleGAN)

Gregory Taylor,Jaspar Wiart,Chutian Ma

SPRING 2025

TEAM

Data Science Boot Camp

Today's Texas Might be Tomorrow's Ohio: Building a Geographic Climate Change Predictor

David Pochik, Alison Duck, Tawny Sit, Jack Neustadt

SPRING 2025

TEAM

Data Science Boot Camp

Who Regulates the Regulators?

Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser

SPRING 2025

TEAM

Data Science Boot Camp

Discovering Next-Gen Battery Materials

Dorisa Tabaku, Avinash Karamchandani, Qinying Chen, Sadisha Nanayakkara, FNU Simran

SPRING 2025

TEAM

Data Science Boot Camp

Predicting Power Outages

Aaron Weinberg, Evan Morris, Anna Zuckerman, Julio Caceres Gonzales

SPRING 2025

Candidate
Profiles

Seeking
Part-Time

Seeking
Full-Time

Seeking
Senior/Managerial

Seeking
DS, ML, AI

Seeking Quant
Research/Finance

Seeking
UX Research

Seeking
Prof/Sci Writing