Data Science Boot Camp
Spring 2025
Jan 23, 2025
-
Apr 30, 2025
Checking your registration status...
To access the program content, you must first create an account and member profile and be logged in.
You are registered for this program.
Registration Deadlines
Jan 29, 2025
-
All Erdős Spring 2025 Career Launch Cohort or Alumni Club members who are not participating in the UX Research nor Deep Learning Boot Camps
-
-
Category
Launch, Core Program, Boot Camp, Projects, Certificates
Overview
The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners.
Our goal is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science project.
We will learn the fundamentals of data science including: data collection, data cleaning, exploratory data analysis, inferential statistics, supervised and unsupervised machine learning techniques, and the basics of neural networks.
Each week of the course we will have a live lecture, a problem session, and an optional "math hour" and office hour.
In order to receive a Data Science certificate you must complete a portfolio worthy project in collaboration with a team of your peers.

Click here to be invited to the slack organization: The Erdős Institute
Click here to access the slack cohort channel: #slack-cohort-channel
Click here to access the slack program channel: #slack-program-channel
Click here to download the Events & Deadlines .ics calendar file
Organizers, Instructors, and Advisors
Steven Gubkin, PhD
Lead Instructor
Office Hours:
W 11am - 12pm and by appt.
Email:
Preferred Contact:
Slack
Please feel free to message me on Slack with any questions!
Alec Clott, PhD
Head of Data Science Projects
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!
Objectives
The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.
Project Examples
TEAM 16
Predicting Lead Contamination in NY School Drinking Water
Ranadeep Roy,Cami Goray,Hana Lang

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.
TEAM 33
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
First Steps/Prerequisites
- Cloned the GitHub repo locally
- Installed the conda environment.
- Run a Jupyter Notebook using that conda environment.
- Base level familiarity with Python
- Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
- Basic statistics and probability
Program Content
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Course materials are available on github through the following link:
github message for user
Textbook/Notes
Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!
DSBC Orientation
Live Lectures
Cohort Orientation.
Lecture 2 part 2
Live Lectures
Linear Regression, kNN regression, Data Splits
Math Hour 3
Math Hour
We give a geometrically motivated derivation of singular value decomposition. We see principle value decomposition as an application.
Lecture 5
Live Lectures
Hypothesis testing, confidence intervals, F-test for nested models.
Lecture 2 part 1
Live Lectures
We describe a parametric supervised learning framework. We also review normally distributed random variables. This covers notebooks 0 and 1 of lecture 1.
Math Hour 2
Math Hour
We show that MLE parameters are the least squares parameters. We derive the normal equations using both linear algebra and differential calculus.
Lecture 4
Live Lectures
Bias/Variance Decomposition, Regularization, Principle Component Analysis, Feature Selection Approaches
Math Hour 5
Math Hour
We explain why the F-statistic follows the F-distribution when comparing nested linear models (under the assumption that the reduced model is the data generating process).
Math Hour 1
Math Hour
We discuss MLE and MAP with simple examples (estimating parameters for Bernoulli and Normal distributions). We also discuss the Bessel corrected variance estimator algebraically and geometrically.
Lecture 3
Live Lectures
Data leakage, Categorical Variables, Feature Transformations, Scaling, Pipelines, Linear Regression Diagnostic Plots.
Math Hour 4
Math Hour
Regularization as MAP estimates with priors on the parameters, Ridge regression using "psuedo-observations", Ridge regression as a "smoothed" version of PCA.
Live Lecture 6
Live Lectures
Finishing Linear Regression Inference, Bootstrapping, Model Specification Testing.
Project/Homework Instructions
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Project Pitch Hour
Project Pitch Hour
Short presentations from participants looking to attract more members to their project.
Schedule
Click on any date for more details
Orientation & Setup Week
Phase 1 - Instruction and Project Completion
Project Review & Judging
Phase 2 - Intense Interview Prep & Career Connections
DS Bootcamp computer setup day
Jan 23, 2025 at 07:00 PM UTC
EVENT
Office Hour 01
Jan 29, 2025 at 04:00 PM UTC
EVENT
Math Hour 02
Feb 5, 2025 at 03:00 PM UTC
EVENT
Lecture 03: Regression II
Feb 11, 2025 at 05:00 PM UTC
EVENT
Problem Session 03
Feb 13, 2025 at 07:00 PM UTC
EVENT
Math Hour 04
Feb 19, 2025 at 03:00 PM UTC
EVENT
Lecture 05: Inference I
Feb 25, 2025 at 05:00 PM UTC
EVENT
Problem Session 05
Feb 27, 2025 at 07:00 PM UTC
EVENT
Office Hour 06
Mar 5, 2025 at 04:00 PM UTC
EVENT
Math Hour 07
Mar 12, 2025 at 02:00 PM UTC
EVENT
Lecture 08: Classification I
Mar 18, 2025 at 04:00 PM UTC
EVENT
Problem Session 08
Mar 20, 2025 at 06:00 PM UTC
EVENT
Office Hour 09
Mar 26, 2025 at 03:00 PM UTC
EVENT
Math Hour 10
Apr 2, 2025 at 02:00 PM UTC
EVENT
Lecture 11: Ensemble Learning II
Apr 8, 2025 at 04:00 PM UTC
EVENT
Problem Session 11
Apr 10, 2025 at 06:00 PM UTC
EVENT
Office Hour 12
Apr 16, 2025 at 03:00 PM UTC
EVENT
Lecture 01: Introduction, Computer Setup, Q/A
Jan 28, 2025 at 05:00 PM UTC
EVENT
Problem Session 01
Jan 30, 2025 at 07:00 PM UTC
EVENT
Office Hour 02
Feb 5, 2025 at 04:00 PM UTC
EVENT
Math Hour 03
Feb 12, 2025 at 03:00 PM UTC
EVENT
Project Pitch Hour
Feb 17, 2025 at 10:00 PM UTC
EVENT
Office Hour 04
Feb 19, 2025 at 04:00 PM UTC
EVENT
Math Hour 05
Feb 26, 2025 at 03:00 PM UTC
EVENT
Lecture 06: Inference II
Mar 4, 2025 at 05:00 PM UTC
EVENT
Problem Session 06
Mar 6, 2025 at 07:00 PM UTC
EVENT
Office Hour 07
Mar 12, 2025 at 03:00 PM UTC
EVENT
Math Hour 08
Mar 19, 2025 at 02:00 PM UTC
EVENT
Lecture 09: Classification II
Mar 25, 2025 at 04:00 PM UTC
EVENT
Problem Session 09
Mar 27, 2025 at 06:00 PM UTC
EVENT
Office Hour 10
Apr 2, 2025 at 03:00 PM UTC
EVENT
Math Hour 11
Apr 9, 2025 at 02:00 PM UTC
EVENT
Lecture 12: Introduction to Neural Networks
Apr 15, 2025 at 04:00 PM UTC
EVENT
Problem Session 12
Apr 17, 2025 at 06:00 PM UTC
EVENT
Math Hour 01
Jan 29, 2025 at 03:00 PM UTC
EVENT
Lecture 02: Regression I
Feb 4, 2025 at 05:00 PM UTC
EVENT
Problem Session 02
Feb 6, 2025 at 07:00 PM UTC
EVENT
Office Hour 03
Feb 12, 2025 at 04:00 PM UTC
EVENT
Lecture 04: Regression III
Feb 18, 2025 at 05:00 PM UTC
EVENT
Problem Session 04
Feb 20, 2025 at 07:00 PM UTC
EVENT
Office Hour 05
Feb 26, 2025 at 04:00 PM UTC
EVENT
Math Hour 06
Mar 5, 2025 at 03:00 PM UTC
EVENT
Lecture 07: Time Series
Mar 11, 2025 at 04:00 PM UTC
EVENT
Problem Session 07
Mar 13, 2025 at 06:00 PM UTC
EVENT
Office Hour 08
Mar 19, 2025 at 03:00 PM UTC
EVENT
Math Hour 09
Mar 26, 2025 at 02:00 PM UTC
EVENT
Lecture 10: Ensemble Learning I
Apr 1, 2025 at 04:00 PM UTC
EVENT
Problem Session 10
Apr 3, 2025 at 06:00 PM UTC
EVENT
Office Hour 11
Apr 9, 2025 at 03:00 PM UTC
EVENT
Math Hour 12
Apr 16, 2025 at 02:00 PM UTC
EVENT
Commencement and Project Showcase
Apr 30, 2025 at 04:00 PM UTC
EVENT
Project/Homework Deadlines
Feb 7, 2025
04:59 PM UTC
Watch video about Project Formation
This should help answer any Q's you may have going into project formation
Feb 7, 2025
04:59 PM UTC
Watch 3 Previous Top Projects
Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.
Feb 17, 2025
10:00 PM UTC
Project Pitch Hour
Opportunity to meet with other Erdos Fellows and form teams and propose topics.
Feb 21, 2025
04:59 PM UTC
Finalized Teams with Preliminary Project Ideas
Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.
Feb 21, 2025
04:59 PM UTC
Data gathering and defining stakeholders + KPIs
Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).
Mar 7, 2025
04:59 PM UTC
Exploratory data analysis + visualizations [Checkpoint]
Distributions of variables, looking for outliers, etc. Descriptive statistics.
Mar 7, 2025
04:59 PM UTC
Data cleaning + preprocessing
Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.
Mar 22, 2025
03:59 AM UTC
Written proposal of modeling approach [Checkpoint]
Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).
Mar 29, 2025
03:59 AM UTC
Machine learning models or equivalent [Checkpoint]
Results with visualizations and/or metrics. List of successes and pitfalls.
Apr 22, 2025
03:59 AM UTC
Final Projects Due
Final Projects must be submitted by this deadline in order to receive a certificate of completion.



