Data Science Boot Camp
Fall 2025
Sep 10, 2025
-
Dec 19, 2025
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Checking your registration status...
To access the program content, you must first create an account and member profile and be logged in.
You are registered for this program.
Registration Deadlines
Sep 3, 2025
-
All Erdős Summer 2025 Career Launch Cohort or Alumni Club members who are not participating in another Launch bootcamp
-
-
Category
Launch, Core Program, Boot Camp, Projects, Certificates
Overview
In this bootcamp, we will develop the skills needed to complete a data science project from start to finish. This includes defining a problem in quantitative terms, identifying key performance indicators (KPIs), acquiring and cleaning data, exploring patterns and trends, and transforming raw data into meaningful variables. We will then build models for prediction and inference, focusing primarily on supervised learning methods for regression and classification.

Click here to be invited to the slack organization: The Erdős Institute
Click here to access the slack cohort channel: #slack-cohort-channel
Click here to access the slack program channel: #slack-program-channel
Click here to download the Events & Deadlines .ics calendar file
Organizers, Instructors, and Advisors
Steven Gubkin, PhD
Lead Instructor
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Please feel free to message me on Slack with any questions!
Alec Clott, PhD
Head of Data Science Projects
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!
Objectives
The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.
Project Examples
TEAM 16
Predicting Lead Contamination in NY School Drinking Water
Ranadeep Roy,Cami Goray,Hana Lang

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.
TEAM 33
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
First Steps/Prerequisites
- Base level familiarity with Python
- Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
- Basic statistics and probability
Program Content
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Course materials are available on github through the following link:
github message for user
Textbook/Notes
Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!
Live Lecture 00: Orientation
Live Lectures
Course orientation.
Live Lecture 03: Complexity Control
Live Lectures
Bias/Variance Decomposition of MSE, Ridge and Lasso regularization, PCA, Nested Cross-validation for hyperparameter tuning.
Live Lecture 06: Inference I
Live Lectures
Hypothesis testing (null and alternative hypothesis, size, power, p-value), confidence intervals, F-test for nested linear models, confidence intervals for model parameters, prediction intervals.
Live Lecture 09: Time Series II
Live Lectures
Stationarity and Autocorrelation, Autoregressive models, Moving Average Models, ARIMA
Live Lecture 01: Supervised Learning
Live Lectures
Data Science workflow overview, data collection, cleaning, EDA, basics of supervised learning, sklearn estimator and transformer APIs.
Live Lecture 04: Linear Regression
Live Lectures
Simple and multiple linear regression, categorical features and interaction terms, non-linear transformations of features.
Live Lecture 07: Inference II
Live Lectures
Bootstrapping and simulation.
Live Lecture 10: Ensembles I
Live Lectures
Decision Trees, Random Forests, Bagging/Pasting, Voter Models
Live Lecture 02: Model Evaluation
Live Lectures
Data Splits for predictive modeling, loss functions and metrics for both regression and classification problems, diagnostic plots for both regression and classification problems.
Live Lecture 05: GLMs and GAMs
Live Lectures
Logistic Regression, Poisson Regression, Generalized Linear Models, Generalized Additive Models
Live Lecture 08: Time Series I
Live Lectures
Time series data splits, baseline models, rolling averages, exponential smoothing models
Live Lecture 11: Ensembles II
Live Lectures
AdaBoost, Gradient Boosting, XGBoost
Project/Homework Instructions
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Schedule
Click on any date for more details
Phase 1 - Instruction and Project Completion: Sep 16 - Nov 5, 2025
Project Review & Judging: Nov 6 - Nov 12, 2025
Phase 2 - Intense Interview Prep & Career Connections: Nov 13 - Dec 19, 2025
Lecture 00: Orientation / Computer Setup Day
Sep 11, 2025 at 05:30 PM UTC
EVENT
Lecture 01: Supervised Learning
Sep 16, 2025 at 05:30 PM UTC
EVENT
Lecture 02: Model Evaluation
Sep 18, 2025 at 05:30 PM UTC
EVENT
Problem Session 02
Sep 22, 2025 at 05:30 PM UTC
EVENT
Problem Session 03
Sep 24, 2025 at 05:30 PM UTC
EVENT
Problem Session 04
Sep 29, 2025 at 05:30 PM UTC
EVENT
Problem Session 05
Oct 1, 2025 at 05:30 PM UTC
EVENT
Problem Session 06
Oct 6, 2025 at 05:30 PM UTC
EVENT
Problem Session 07
Oct 8, 2025 at 05:30 PM UTC
EVENT
Problem Session 08
Oct 13, 2025 at 05:30 PM UTC
EVENT
Problem Session 09
Oct 15, 2025 at 05:30 PM UTC
EVENT
Problem Session 10
Oct 20, 2025 at 05:30 PM UTC
EVENT
Problem Session 11
Oct 22, 2025 at 05:30 PM UTC
EVENT
Data Science Project Showcase
Dec 10, 2025 at 05:00 PM UTC
EVENT
Math Hour 00
Sep 15, 2025 at 02:00 PM UTC
EVENT
Math Hour 01
Sep 17, 2025 at 02:00 PM UTC
EVENT
Project Pitch Hour
Sep 19, 2025 at 08:00 PM UTC
EVENT
Lecture 03: Complexity Control
Sep 23, 2025 at 08:00 PM UTC
EVENT
Lecture 04: Linear Regression
Sep 25, 2025 at 05:30 PM UTC
EVENT
Lecture 05: Generalized Linear Models and Generalized Additive Models
Sep 30, 2025 at 05:30 PM UTC
EVENT
Lecture 06: Inference I
Oct 2, 2025 at 05:30 PM UTC
EVENT
Lecture 07: Inference II
Oct 7, 2025 at 05:30 PM UTC
EVENT
Lecture 08: Time Series I
Oct 9, 2025 at 05:30 PM UTC
EVENT
Lecture 09: Time Series II
Oct 14, 2025 at 05:30 PM UTC
EVENT
Lecture 10: Ensemble Learning I
Oct 16, 2025 at 05:30 PM UTC
EVENT
Lecture 11: Ensemble Learning II
Oct 21, 2025 at 05:30 PM UTC
EVENT
Lecture 12: Introduction to Neural Networks
Oct 23, 2025 at 05:30 PM UTC
EVENT
Problem Session 00
Sep 15, 2025 at 05:30 PM UTC
EVENT
Problem Session 01
Sep 17, 2025 at 05:30 PM UTC
EVENT
Math Hour 02
Sep 22, 2025 at 02:00 PM UTC
EVENT
Math Hour 03
Sep 24, 2025 at 02:00 PM UTC
EVENT
Math Hour 04
Sep 29, 2025 at 02:00 PM UTC
EVENT
Math Hour 05
Oct 1, 2025 at 02:00 PM UTC
EVENT
Math Hour 06
Oct 6, 2025 at 02:00 PM UTC
EVENT
Math Hour 07
Oct 8, 2025 at 02:00 PM UTC
EVENT
Math Hour 08
Oct 13, 2025 at 02:00 PM UTC
EVENT
Math Hour 09
Oct 15, 2025 at 02:00 PM UTC
EVENT
Math Hour 10
Oct 20, 2025 at 02:00 PM UTC
EVENT
Math Hour 11
Oct 22, 2025 at 02:00 PM UTC
EVENT
Phase II Orientation
Nov 17, 2025 at 07:00 PM UTC
EVENT
Project/Homework Deadlines
Sep 11, 2025
03:59 AM UTC
Last chance to switch bootcamps
Email Amalya Lehmann at amalya@erdosinstitute.org if you would like to switch to a different bootcamp.
Sep 17, 2025
03:59 AM UTC
Watch video about Project Formation
This should help answer any Q's you may have going into project formation
Sep 17, 2025
03:59 AM UTC
Watch 3 Previous Top Projects
Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.
Sep 19, 2025
08:00 PM UTC
Project Pitch Hour
Opportunity to meet with other Erdős Fellows and form teams and propose topics.
Sep 22, 2025
03:59 AM UTC
Finalized Teams with Preliminary Project Ideas
Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.
Sep 22, 2025
03:59 AM UTC
Last day to defer enrollment to a future cohort
Contact Amalya Lehmann (amalya@erdosinstitute.org) if you would like to unenroll from this cohort and defer to a future cohort.
Sep 26, 2025
03:59 AM UTC
Data gathering and defining stakeholders + KPIs
Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).
Oct 3, 2025
03:59 AM UTC
Data cleaning + preprocessing + EDA
Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering. Exploratory data analysis.
Oct 10, 2025
03:59 AM UTC
Written proposal of modeling approach [Checkpoint]
Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).
Oct 17, 2025
03:59 AM UTC
Modeling and Preliminary Results
Results with visualizations and/or metrics. List of successes and pitfalls.
Oct 24, 2025
03:59 AM UTC
Clean your repository and start working on final presentation
Clean up your repository so that an outsider can easily follow your work. Convert notebooks into scripts where possible. Confirm that the whole pipeline from data ingestion all the way to prediction or inference works without fuss.
Nov 6, 2025
04:59 AM UTC
Final Projects Due
Final Projects must be submitted by this deadline in order to receive a certificate of completion.



