Data Science Boot Camp
Fall 2024
Sep 5, 2024
-
Dec 13, 2024
Checking your registration status...
To access the program content, you must first create an account and member profile and be logged in.
You are registered for this program.
Registration Deadlines
Sep 6, 2024
-
All Erdős Fall 2024 Career Launch Cohort or Alumni Club members who are not participating in the UX Research nor Deep Learning Boot Camps
-
-
Category
Launch, Core Program, Boot Camp, Projects, Certificates
Overview
The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online three times per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Spring & Fall.

Click here to be invited to the slack organization: The Erdős Institute
Click here to access the slack cohort channel: #slack-cohort-channel
Click here to access the slack program channel: #slack-program-channel
Click here to download the Events & Deadlines .ics calendar file
Organizers, Instructors, and Advisors
Steven Gubkin, PhD
Lead Instructor
Office Hours:
W 11am - 12pm and by appt.
Email:
Preferred Contact:
Slack
Please feel free to message me on Slack with any questions!
Alec Clott, PhD
Head of Data Science Projects
Office Hours:
By appt. only
Email:
Preferred Contact:
Slack
Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!
Objectives
The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.
Project Examples
TEAM 33
Tuning Up Music Highway
James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.
TEAM 29
Who Regulates the Regulators?
Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser

With recent major cuts to governmental regulation agencies in the US, we investigate whether those cuts are justified. In particular, we analyze the efficacy of RGGI, a state-level cap-and-trade program designed to regulate CO2 emissions in power plants. By using synthetic controls, we answer the counterfactual question: "how would CO2 emissions look in a world where RGGI was never enacted?".
First Steps/Prerequisites
- Cloned the GitHub repo locally
- Installed the conda environment.
- Run a Jupyter Notebook using that conda environment.
- Base level familiarity with Python
- Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
- Basic statistics and probability
Program Content
I'm a paragraph. Click here to add your own text and edit me. It's easy.
Course materials are available on github through the following link:
github message for user
Textbook/Notes
Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!
Live Lecture 2: Regression I
Live Lectures
We discussed a general supervised machine learning framework, simple linear regression, multiple linear regression, and data splits for predictive modeling. Note that we delayed kNN to next week.
Math Hour 3
Math Hour
Singular Value Decomposition and Principle Component Analysis
Live Lecture 5: Inference I
Live Lectures
Hypothesis testing (null and alternative hypothesis, test size and power, p-values), confidence intervals, part of linear regression inference. Will finish linear regression inference next week.
Alec's Lost Introduction
Live Lectures
Due to a technical glitch, the audio for the video Alec made to introduce himself and relevant information about projects didn't work in the orientation lecture. You can watch it now!
Math Hour 2
Math Hour
We derive the MLE estimates of multiple linear regression model parameters. This shows that OLS is MLE for linear regression. We derive the normal equation two ways: linear algebra and calculus.
Live Lecture 4: Regression III
Live Lectures
Bias/Variance tradeoff, regularization, principle component analysis, and feature selection approaches.
Math Hour 5
Math Hour
We prove that the F statistic for comparing a full to reduced linear model is indeed F distributed under the null hypothesis that the reduced model is correct. Note: correction included at the end!
Math Hour 1
Math Hour
We discuss Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) Estimation of model parameters. We address estimating the parameters of binomial and normal distributions from a sample.
Live Lecture 3: Regression II
Live Lectures
k-Nearest Neighbors, data leakage, categorical features, non-linear transformations, pipelines, linear regression diagnostic plots.
Math Hour 4
Math Hour
Regularization as MAP estimation, Ridge regression using "pseudo-observations", understanding ridge regression as shrinking the OLS solution along principle component axes in a data adaptive manner.
Math Hour 5 Correction
Math Hour
Close to the end of Math Hour 5 I used the wrong definition of the F-Statistic. Essentially I used the squared cosine of an angle instead of the squared tangent. I correct that in this video.
Project/Homework Instructions
I'm a paragraph. Click here to add your own text and edit me. It's easy.
F24 Project Pitch Hour
Project Pitch Hour
Participants pitch their project ideas and then get into breakout rooms to discuss the ideas they are interested in.
Schedule
Click on any date for more details
Phase 1 - Instruction and Project Completion
Project Review & Judging
Phase 2 - Intense Interview Prep & Career Connections
DS Bootcamp computer setup day
Sep 5, 2024 at 06:00 PM UTC
EVENT
Office Hour 1
Sep 11, 2024 at 03:00 PM UTC
EVENT
Math Hour 2
Sep 18, 2024 at 02:00 PM UTC
EVENT
Lecture 3: Regression II
Sep 24, 2024 at 04:00 PM UTC
EVENT
Problem Session 3
Sep 26, 2024 at 06:00 PM UTC
EVENT
Math Hour 4
Oct 2, 2024 at 02:00 PM UTC
EVENT
Lecture 5: Inference I
Oct 8, 2024 at 04:00 PM UTC
EVENT
Problem Session 5
Oct 10, 2024 at 06:00 PM UTC
EVENT
Office Hour 6
Oct 16, 2024 at 03:00 PM UTC
EVENT
Problem Session 7
Oct 24, 2024 at 06:00 PM UTC
EVENT
Math Hour 8
Oct 30, 2024 at 02:00 PM UTC
EVENT
Lecture 9: Classification II
Nov 5, 2024 at 05:00 PM UTC
EVENT
Problem Session 9
Nov 7, 2024 at 07:00 PM UTC
EVENT
Problem Session 10
Nov 14, 2024 at 07:00 PM UTC
EVENT
Math Hour 11
Nov 20, 2024 at 03:00 PM UTC
EVENT
Lecture 12: Introduction to Neural Networks
Nov 26, 2024 at 05:00 PM UTC
EVENT
Commencement and Project Showcase
Dec 11, 2024 at 05:00 PM UTC
EVENT
Lecture 1: Introduction, Computer Setup, Q/A
Sep 10, 2024 at 04:00 PM UTC
EVENT
Problem Session 1
Sep 12, 2024 at 06:00 PM UTC
EVENT
Office Hour 2
Sep 18, 2024 at 03:00 PM UTC
EVENT
Math Hour 3
Sep 25, 2024 at 02:00 PM UTC
EVENT
Project Pitch Hour
Sep 30, 2024 at 08:30 PM UTC
EVENT
Office Hour 4
Oct 2, 2024 at 03:00 PM UTC
EVENT
Math Hour 5
Oct 9, 2024 at 02:00 PM UTC
EVENT
Lecture 6: Inference II
Oct 15, 2024 at 04:00 PM UTC
EVENT
Problem Session 6
Oct 17, 2024 at 06:00 PM UTC
EVENT
Office Hour 7
Oct 25, 2024 at 03:00 PM UTC
EVENT
Office Hour 8
Oct 30, 2024 at 03:00 PM UTC
EVENT
Math Hour 9
Nov 6, 2024 at 03:00 PM UTC
EVENT
Lecture 10: Ensemble Learning I
Nov 12, 2024 at 05:00 PM UTC
EVENT
Math Hour 10
Nov 15, 2024 at 03:00 PM UTC
EVENT
Office Hour 11
Nov 20, 2024 at 04:00 PM UTC
EVENT
Math Hour 12
Nov 27, 2024 at 03:00 PM UTC
EVENT
Math Hour 1
Sep 11, 2024 at 02:00 PM UTC
EVENT
Lecture 2: Regression I
Sep 17, 2024 at 04:00 PM UTC
EVENT
Problem Session 2
Sep 19, 2024 at 06:00 PM UTC
EVENT
Office Hour 3
Sep 25, 2024 at 03:00 PM UTC
EVENT
Lecture 4: Regression III
Oct 1, 2024 at 04:00 PM UTC
EVENT
Problem Session 4
Oct 3, 2024 at 06:00 PM UTC
EVENT
Office Hour 5
Oct 9, 2024 at 03:00 PM UTC
EVENT
Math Hour 6
Oct 16, 2024 at 02:00 PM UTC
EVENT
Lecture 7: Time Series
Oct 22, 2024 at 04:00 PM UTC
EVENT
Lecture 8: Classification I
Oct 29, 2024 at 04:00 PM UTC
EVENT
Problem Session 8
Oct 31, 2024 at 06:00 PM UTC
EVENT
Office Hour 9
Nov 6, 2024 at 04:00 PM UTC
EVENT
Office Hour 10
Nov 13, 2024 at 04:00 PM UTC
EVENT
Lecture 11: Ensemble Learning II
Nov 19, 2024 at 05:00 PM UTC
EVENT
Problem Session 11
Nov 21, 2024 at 07:00 PM UTC
EVENT
Office Hour 12
Nov 27, 2024 at 04:00 PM UTC
EVENT
Project/Homework Deadlines
Sep 21, 2024
03:59 AM UTC
Watch video about Project Formation
This should help answer any Q's you may have going into project formation
Sep 21, 2024
03:59 AM UTC
Watch 3 Previous Top Projects
Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.
Sep 30, 2024
08:30 PM UTC
Project Pitch Hour
Opportunity to meet with other Erdos Fellows and form teams and propose topics.
Oct 5, 2024
03:59 AM UTC
Data gathering and defining stakeholders + KPIs
Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).
Oct 5, 2024
03:59 AM UTC
Finalized Teams with Preliminary Project Ideas
Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.
Oct 19, 2024
03:59 AM UTC
Exploratory data analysis + visualizations [Checkpoint]
Distributions of variables, looking for outliers, etc. Descriptive statistics.
Oct 19, 2024
03:59 AM UTC
Data cleaning + preprocessing
Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.
Nov 2, 2024
03:59 AM UTC
Written proposal of modeling approach [Checkpoint]
Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).
Nov 9, 2024
04:59 AM UTC
Machine learning models or equivalent [Checkpoint]
Results with visualizations and/or metrics. List of successes and pitfalls.
Dec 3, 2024
04:59 AM UTC
Final Projects Due
Final Projects must be submitted by this deadline in order to receive a certificate of completion.



