Data Science Boot Camp
Spring 2024
Feb 7, 2024
-
May 1, 2024
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Checking your registration status...
To access the program content, you must first create an account and member profile and be logged in.
You are registered for this program.
Registration Deadlines
Feb 8, 2024
-
All interested participants
-
-
Category
Launch, Core Program, Boot Camp, Projects, Certificates
Overview
The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online three times per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Spring & Fall.

Click here to be invited to the slack organization: The Erdős Institute
Click here to access the slack cohort channel: #slack-cohort-channel
Click here to access the slack program channel: #slack-program-channel
Click here to download the Events & Deadlines .ics calendar file
Organizers, Instructors, and Advisors
Steven Gubkin, PhD
Lead Instructor
Office Hours:
Tu: 11am - 12pm ET, and by appt.
Email:
Preferred Contact:
Slack
Please feel free to message me on Slack with any questions!
Matthew Osborne, PhD
Alumni Advisor
Office Hours:
By appointment only
Email:
Preferred Contact:
Slack
Don't hesitate to contact me with any questions or concerns.
Alec Clott, PhD
Head of Data Science Projects
Office Hours:
Wed. 12-12:30pm EST, and by appt.
Email:
Preferred Contact:
Slack
Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!
Objectives
The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.
Project Examples
TEAM 16
Predicting Lead Contamination in NY School Drinking Water
Ranadeep Roy,Cami Goray,Hana Lang

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.
TEAM 33
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
First Steps/Prerequisites
Program Content
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Course materials are available on github through the following link:
github message for user
Textbook/Notes
Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!
Lecture 1: Orientation
Live Lectures
In this video we welcome you to our data science content.
Lecture 3: Regression I
Live Lectures
We discuss:
1. A parametric supervised learning framework.
2. Single and Multiple Linear Regression.
3. Data Splits
Math Hour 4
Math Hour
We give several perspectives on regularization techniques including:
1. Ridge and Lasso as MAP estimators.
2. Ridge as OLS with "pseudo-observations"
3. Ridge as a "smooth" version of PCA regression.
Live: Math Hour
01 Week Spring 2024: Math Hour
In the first week of Math Hour we:
(1) Introduce a Framework for Parametric Supervised Machine Learning
(2) Discuss Maximum Likelihood Estimation (MLE)
(3) Discuss Maximum A Posteriori Estimation (MAP)
Lecture 2: Data Collection
Live Lectures
We cover 4 methods of collecting data:
1. Data Source Websites
2. Web scraping with BeautifulSoup
3. APIs
4. Using SQLAlchemy to interact with SQL databases.
Lecture 5: Regression III
Live Lectures
We discuss:
1. The Bias/Variance Tradeoff
2. Regularization
3. Principle Component Analysis
4. Feature Selection Approaches
Math Hour 2
Math Hour
1. We give a geometric interpretation of Bessel's correction
2. We derive MLE estimates for simple linear regression.
3. We interpret multiple linear regression as orthogonal projection.
Lecture 4: Regression II
Live Lectures
We discuss categorical variables, non-linear transformations, scaling data, pipelines, and residual plots.
Lecture 6: Time Series I
Live Lectures
Introduction to time series, including adjustments to cross validation, dates and times in Python, baseline models, rolling average forecasts, and exponential smoothing forecasts.
Project/Homework Instructions
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Project Pitches and Resources
Project Pitch Hour
This is the recording of the May 1st 4:30 PM Project Pitch Hour.
Jim Schwoebel introduces databoard. If you are interested in generating your own synthetic dataset for your project, then please contact Jim and Roman on slack.
Emiliano Santarnecchi introduces his research and commercialization interests in Neuromodulation and Neurostimulation. He is open for project conversations and guidance in these and related areas. https://gordon.mgh.harvard.edu/research/precision-neuroscience-neuromodulation-program/
Then Erdős Spring 2024 participants pitched their project ideas.
Schedule
Click on any date for more details
Phase 1: Instruction and Project Completion
Project Review & Judging
Phase 2: Intense Interview Prep & Career Connections
Lecture 1: Introduction
Feb 5, 2024 at 08:00 PM UTC
EVENT
Lecture 2: Data Collection
Feb 12, 2024 at 08:00 PM UTC
EVENT
Lecture 3: Regression I
Feb 19, 2024 at 08:00 PM UTC
EVENT
Lecture 4: Regression II
Feb 26, 2024 at 08:00 PM UTC
EVENT
Project Pitch Hour
Mar 1, 2024 at 09:30 PM UTC
EVENT
Problem Solving Session 5
Mar 7, 2024 at 08:00 PM UTC
EVENT
Problem Solving Session 6
Mar 14, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 7
Mar 21, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 8
Mar 28, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 9
Apr 4, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 10
Apr 11, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 11
Apr 18, 2024 at 07:00 PM UTC
EVENT
Problem Solving Session 12
Apr 25, 2024 at 07:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Feb 6, 2024 at 03:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Feb 13, 2024 at 03:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Feb 20, 2024 at 03:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Feb 27, 2024 at 03:00 PM UTC
EVENT
Lecture 5: Regression III
Mar 4, 2024 at 08:00 PM UTC
EVENT
Lecture 6: Time Series I
Mar 11, 2024 at 07:00 PM UTC
EVENT
Lecture 7: Time Series II
Mar 18, 2024 at 07:00 PM UTC
EVENT
Lecture 8: Classification I
Mar 25, 2024 at 07:00 PM UTC
EVENT
Lecture 9: Classification II
Apr 1, 2024 at 07:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Apr 9, 2024 at 02:00 PM UTC
EVENT
Lecture 11: Ensemble Learning II
Apr 15, 2024 at 07:00 PM UTC
EVENT
Lecture 12: Neural Networks
Apr 22, 2024 at 07:00 PM UTC
EVENT
Erdős Spring Final Project Showcase
May 1, 2024 at 04:00 PM UTC
EVENT
Problem Solving Session 1
Feb 8, 2024 at 08:00 PM UTC
EVENT
Problem Solving Session 2
Feb 15, 2024 at 08:00 PM UTC
EVENT
Problem Solving Session 3
Feb 22, 2024 at 08:00 PM UTC
EVENT
Problem Solving Session 4
Feb 29, 2024 at 08:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Mar 5, 2024 at 03:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Mar 12, 2024 at 02:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Mar 19, 2024 at 02:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Mar 26, 2024 at 02:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Apr 2, 2024 at 02:00 PM UTC
EVENT
Lecture 10: Ensemble Learning I
Apr 10, 2024 at 07:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Apr 16, 2024 at 02:00 PM UTC
EVENT
Math Hour (at 10) / Office Hour (at 11)
Apr 23, 2024 at 02:00 PM UTC
EVENT
Project/Homework Deadlines
Feb 20, 2024
04:59 AM UTC
Watch 3 Previous Top Projects
Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.
Mar 1, 2024
04:59 AM UTC
Watch video about Project Formation
This should help answer any Q's you may have going into project formation
Mar 1, 2024
09:30 PM UTC
Project Pitch Hour
Click here for the zoom to join the project pitch hour session, an opportunity to meet with other Erdos Fellows and form teams and propose topics.
Mar 9, 2024
04:59 AM UTC
Submit Team Proposal or Idea to Project Formation Page
If you want to propose a project, or have an idea for a project, submit it by this date.
Mar 12, 2024
03:59 AM UTC
Finalized Teams with Preliminary Project Ideas
Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.
Mar 19, 2024
03:59 AM UTC
Data gathering and defining stakeholders + KPIs
Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).
Mar 26, 2024
03:59 AM UTC
Data cleaning + preprocessing
Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.
Apr 2, 2024
03:59 AM UTC
Exploratory data analysis + visualizations [Checkpoint]
Distributions of variables, looking for outliers, etc. Descriptive statistics.
Apr 9, 2024
03:59 AM UTC
Written proposal of modeling approach [Checkpoint]
Test linearity assumptions. Dimensionality reductions (if necessary). Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).
Apr 16, 2024
03:59 AM UTC
Machine learning models or equivalent [Checkpoint]
Results with visualizations and/or metrics. List of successes and pitfalls.
Apr 27, 2024
03:59 AM UTC
Final project due
Please read the submission instructions on the link below.




