Data Science Boot Camp
Summer 2025
May 8, 2025
-
Aug 15, 2025
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Checking your registration status...
To access the program content, you must first create an account and member profile and be logged in.
You are registered for this program.
Lecture 01: Orientation / Computer Setup Day
Next Event
NEXT EVENT
Registration Deadlines
May 8, 2025
-
Summer 2025 Cohort participants
-
-
Category
Launch, Core Program, Boot Camp, Projects, Certificates
Overview
In this bootcamp, we will develop the skills needed to complete a data science project from start to finish. This includes defining a problem in quantitative terms, identifying key performance indicators (KPIs), acquiring and cleaning data, exploring patterns and trends, and transforming raw data into meaningful variables. We will then build models for prediction and inference, focusing primarily on supervised learning methods for regression and classification.

Click here to be invited to the slack organization: The Erdős Institute
Click here to access the slack cohort channel: #slack-cohort-channel
Click here to access the slack program channel: #slack-program-channel
Click here to download the Events & Deadlines .ics calendar file
Organizers, Instructors, and Advisors
Steven Gubkin, PhD
Lead Instructor
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Please feel free to message me on Slack with any questions!
Alec Clott, PhD
Head of Data Science Projects
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!
Objectives
The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.
Project Examples
TEAM 16
Predicting Lead Contamination in NY School Drinking Water
Ranadeep Roy,Cami Goray,Hana Lang

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.
TEAM 33
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
First Steps/Prerequisites
- Base level familiarity with Python
- Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
- Basic statistics and probability
Program Content
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Course materials are available on github through the following link:
github message for user
Textbook/Notes
Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!
Live Lectures 07: Time Series I
Live Lectures
Baseline models for time series, rolling averages, exponential smoothing models.
Live Lecture 9: Classification I
Live Lectures
Stratified splits, kNN classification, logistic regression
Math Hour 10
Math Hour
Definition of multivariate normal distributions, showing that the LDA classifier uses the nearest neighbor to the class mean vector in the Mahalanobis metric, MLE estimates of mean and covariance.
Live Lecture 12: Ensembles II
Live Lectures
Adaboost, Gradient Boosting, XGBoost. We ensemble estimators which are each individually prone to underfitting by adding new estimators trained on the (psuedo)residuals of the old ensemble.
Math Hour 07
Math Hour
We derive confidence intervals for linear regression parameters, conditional means, and prediction intervals.
Math Hour 9
Math Hour
This math hour was focused on Generalized Linear Models, with the three primary examples being linear regression, logistic regression, and Poisson regression.
Live Lecture 11: Ensemble I
Live Lectures
Decision trees, random forests, bagging/pasting, voter models.
Live Lecture 08: Time Series II
Live Lectures
Stationarity and autocorrelation, autoregressive models, moving average models, SARIMA.
Live Lecture 10: Classification II
Live Lectures
Classification metrics, cross-entropy loss, Bayes Based Classifiers (LDA/QDA/NB), Support Vector Machines
Live Lecture 13: Neural Networks
Live Lectures
Feed Forward Neural Networks, Convolution layers with application to image classification, Recurrent Neural Networks with application to text classification.
Project/Homework Instructions
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Schedule
Click on any date for more details
Orientation & Setup Week: May 8 - May 12, 2025
Phase 1 - Instruction and Project Completion: May 13 - Jul 2, 2025
Project Review & Judging: Jul 3 - Jul 9, 2025
Phase 2 - Intense Interview Prep & Career Connections for Certificate Holders: Jul 10 - Aug 15, 2025
Lecture 01: Orientation / Computer Setup Day
May 8, 2025 at 05:30 PM UTC
EVENT
Problem Session 02
May 14, 2025 at 05:00 PM UTC
EVENT
Problem Session 03
May 16, 2025 at 05:00 PM UTC
EVENT
Math Hour 04
May 21, 2025 at 02:00 PM UTC
EVENT
Math Hour 05
May 23, 2025 at 02:00 PM UTC
EVENT
Math Hour 06
May 28, 2025 at 02:00 PM UTC
EVENT
Math Hour 07
May 30, 2025 at 02:00 PM UTC
EVENT
Cancelled: Math Hour 08
Jun 4, 2025 at 02:00 PM UTC
EVENT
Math Hour 09
Jun 6, 2025 at 02:00 PM UTC
EVENT
Math Hour 10
Jun 11, 2025 at 02:00 PM UTC
EVENT
Math Hour 11
Jun 13, 2025 at 02:00 PM UTC
EVENT
Math Hour 12
Jun 18, 2025 at 02:00 PM UTC
EVENT
Math Hour 13
Jun 20, 2025 at 02:00 PM UTC
EVENT
DS Practice Interview, Option A (Link in Slack Channel)
Jul 10, 2025 at 04:00 PM UTC
EVENT
Project Showcase and Commencement
Jul 15, 2025 at 04:00 PM UTC
EVENT
DS Practice Interview, Option B
Jul 18, 2025 at 12:00 AM UTC
EVENT
DS Phase II Office Hour
Jul 23, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option C
Jul 25, 2025 at 01:00 PM UTC
EVENT
DS Practice Interview, Option B
Aug 1, 2025 at 12:00 AM UTC
EVENT
DS Practice Interview, Option A
Aug 7, 2025 at 04:00 PM UTC
EVENT
DS Phase II Office Hour
Aug 13, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option C (Link in Slack Channel)
Aug 15, 2025 at 01:00 PM UTC
EVENT
Lecture 02: Regression I
May 13, 2025 at 05:30 PM UTC
EVENT
Lecture 03: Regression II
May 15, 2025 at 05:30 PM UTC
EVENT
Project Pitch Hour
May 16, 2025 at 08:00 PM UTC
EVENT
Problem Session 04
May 21, 2025 at 05:00 PM UTC
EVENT
Problem Session 05
May 23, 2025 at 05:00 PM UTC
EVENT
Problem Session 06
May 28, 2025 at 05:00 PM UTC
EVENT
Problem Session 07
May 30, 2025 at 05:00 PM UTC
EVENT
Problem Session 08
Jun 4, 2025 at 05:00 PM UTC
EVENT
Problem Session 09
Jun 6, 2025 at 05:00 PM UTC
EVENT
Problem Session 10
Jun 11, 2025 at 05:00 PM UTC
EVENT
Problem Session 11
Jun 13, 2025 at 05:00 PM UTC
EVENT
Problem Session 12
Jun 18, 2025 at 05:00 PM UTC
EVENT
Problem Session 13
Jun 20, 2025 at 05:00 PM UTC
EVENT
DS Practice Interview, Option B (Link in Slack Channel)
Jul 11, 2025 at 12:00 AM UTC
EVENT
DS Phase II Office Hour
Jul 16, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option C
Jul 18, 2025 at 01:00 PM UTC
EVENT
DS Practice Interview, Option A
Jul 24, 2025 at 04:00 PM UTC
EVENT
DS Phase II Office Hour
Jul 30, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option C
Aug 1, 2025 at 01:00 PM UTC
EVENT
DS Practice Interview, Option B
Aug 8, 2025 at 12:00 AM UTC
EVENT
DS Practice Interview, Option A (Link in Slack Channel)
Aug 14, 2025 at 04:00 PM UTC
EVENT
DS Phase II Office Hour
Aug 20, 2025 at 06:00 PM UTC
EVENT
Math Hour 02
May 14, 2025 at 02:00 PM UTC
EVENT
Math Hour 03
May 16, 2025 at 02:00 PM UTC
EVENT
Lecture 04: Regression III
May 20, 2025 at 05:30 PM UTC
EVENT
Lecture 05: Inference I
May 22, 2025 at 05:30 PM UTC
EVENT
Lecture 06: Inference II
May 27, 2025 at 05:30 PM UTC
EVENT
Lecture 07: Time Series I
May 29, 2025 at 05:30 PM UTC
EVENT
Lecture 08: Time Series II
Jun 3, 2025 at 05:30 PM UTC
EVENT
Lecture 09: Classification I
Jun 5, 2025 at 05:30 PM UTC
EVENT
Lecture 10: Classification II
Jun 10, 2025 at 05:30 PM UTC
EVENT
Lecture 11: Ensemble Learning I
Jun 12, 2025 at 05:30 PM UTC
EVENT
Cancelled: Lecture 12 ( Ensemble Learning II )
Jun 17, 2025 at 05:30 PM UTC
EVENT
Lecture 13: Introduction to Neural Networks
Jun 19, 2025 at 05:30 PM UTC
EVENT
DS Technical Interview Prep Overview
Jul 7, 2025 at 08:00 PM UTC
EVENT
DS Practice Interview, Option C (Link in Slack Channel)
Jul 11, 2025 at 01:00 PM UTC
EVENT
DS Practice Interview, Option A
Jul 17, 2025 at 04:00 PM UTC
EVENT
DS Phase II Office Hour
Jul 23, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option B
Jul 25, 2025 at 12:00 AM UTC
EVENT
DS Practice Interview, Option A
Jul 31, 2025 at 04:00 PM UTC
EVENT
DS Phase II Office Hour
Aug 6, 2025 at 06:00 PM UTC
EVENT
DS Practice Interview, Option C
Aug 8, 2025 at 01:00 PM UTC
EVENT
DS Practice Interview, Option B (Link in Slack Channel)
Aug 15, 2025 at 12:00 AM UTC
EVENT
Project/Homework Deadlines
May 8, 2025
03:59 AM UTC
Last chance to switch bootcamps
Email Amalya Lehmann at amalya@erdosinstitute.org if you would like to switch to a different bootcamp.
May 15, 2025
03:59 AM UTC
Watch 3 Previous Top Projects
Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.
May 15, 2025
03:59 AM UTC
Watch video about Project Formation
This should help answer any Q's you may have going into project formation
May 16, 2025
08:00 PM UTC
Project Pitch Hour
Opportunity to meet with other Erdős Fellows and form teams and propose topics.
May 20, 2025
03:59 AM UTC
Last day to defer enrollment to a future cohort
Contact Amalya Lehmann (amalya@erdosinstitute.org) if you would like to unenroll this cohort and defer to a future cohort.
May 20, 2025
03:59 AM UTC
Finalized Teams with Preliminary Project Ideas
Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.
May 24, 2025
03:59 AM UTC
Data gathering and defining stakeholders + KPIs
Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).
May 31, 2025
03:59 AM UTC
Data cleaning + preprocessing
Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.
Jun 7, 2025
03:59 AM UTC
Written proposal of modeling approach [Checkpoint]
Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).
Jun 14, 2025
03:59 AM UTC
Preliminary Results
Results with visualizations and/or metrics. List of successes and pitfalls.
Jun 21, 2025
03:59 AM UTC
Clean your repository
Clean up your repository so that an outsider can easily follow your work. Convert notebooks into scripts where possible. Confirm that the whole pipeline from data ingestion all the way to prediction or inference works without fuss.
Jun 28, 2025
03:59 AM UTC
Final Projects Due
Final Projects must be submitted by this deadline in order to receive a certificate of completion.


