top of page
Data Science Boot Camp

Summer 2025

May 8, 2025

-

Aug 15, 2025

I'm a paragraph. Click here to add your own text and edit me. It's easy.

erdosOspin.gif

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Lecture 01: Orientation / Computer Setup Day

Next Event

NEXT EVENT

Registration Deadlines

May 8, 2025

-

Summer 2025 Cohort participants

-

-

Category

Launch, Core Program, Boot Camp, Projects, Certificates

Overview

In this bootcamp, we will develop the skills needed to complete a data science project from start to finish. This includes defining a problem in quantitative terms, identifying key performance indicators (KPIs), acquiring and cleaning data, exploring patterns and trends, and transforming raw data into meaningful variables. We will then build models for prediction and inference, focusing primarily on supervised learning methods for regression and classification.

Slack

Click here to be invited to the slack organization: The Erdős Institute

Click here to access the slack cohort channel: #slack-cohort-channel

Click here to access the slack program channel: #slack-program-channel

calendar-icon.png

Click here to download the Events & Deadlines .ics calendar file

Organizers, Instructors, and Advisors

matt_osborne.png

Steven Gubkin, PhD

Lead Instructor

Office Hours:

By appt. only

Email:

Preferred Contact:

Slack

Please feel free to message me on Slack with any questions!

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

By appt. only

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Project Examples

TEAM 16

Predicting Lead Contamination in NY School Drinking Water

Ranadeep Roy,Cami Goray,Hana Lang

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.

TEAM 33

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

First Steps/Prerequisites

Course Orientation / Computer Setup Day
Our first meeting is Thursday, May 8, from 1:30 PM - 3:00 PM ET. I will give a brief orientation to the course. The remainder of the time will be spent on the following very simple goal: to clone the repo, install the conda environment, and use that conda environment to run a Jupyter notebook. It is impossible to participate in the course without these abilities, so it is important to attend this session. If you can do these things, please show up to help the other participants!
 
Detailed instructions (created by teaching assistant Ness Mayker Chen) can be found at this link.
 
We will test your ability to do these things by having you submit a "secret code". You will obtain this code by successfully running the notebook
 
computer_setup_day/find_secret_code.ipynb
 
When you have obtained the code put it in the textbox at https://www.erdosinstitute.org/ds-boot-camp-prep
 
If you can do these things independently please show up to help your colleagues!
If you cannot do these things independently please show up to get help from your colleagues!
 
Prerequisites
 
In addition to these computer setup steps there are also some content prerequisites:
  1. Base level familiarity with Python
  2. Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
  3. Basic statistics and probability

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Course materials are available on github through the following link:

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Request Access to GitHub

github message for user

Program Content

Textbook/Notes

Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!

Live Lectures 07: Time Series I

Live Lectures

Baseline models for time series, rolling averages, exponential smoothing models.

Slides
Transcript
Code

Live Lecture 9: Classification I

Live Lectures

Stratified splits, kNN classification, logistic regression

Slides
Transcript
Code

Math Hour 10

Math Hour

Definition of multivariate normal distributions, showing that the LDA classifier uses the nearest neighbor to the class mean vector in the Mahalanobis metric, MLE estimates of mean and covariance.

Slides
Transcript
Code

Live Lecture 12: Ensembles II

Live Lectures

Adaboost, Gradient Boosting, XGBoost. We ensemble estimators which are each individually prone to underfitting by adding new estimators trained on the (psuedo)residuals of the old ensemble.

Slides
Transcript
Code

Math Hour 07

Math Hour

We derive confidence intervals for linear regression parameters, conditional means, and prediction intervals.

Slides
Transcript
Code

Math Hour 9

Math Hour

This math hour was focused on Generalized Linear Models, with the three primary examples being linear regression, logistic regression, and Poisson regression.

Slides
Transcript
Code

Live Lecture 11: Ensemble I

Live Lectures

Decision trees, random forests, bagging/pasting, voter models.

Slides
Transcript
Code

Math Hour 12

Math Hour

Symmetric pointwise positive definite kernels and feature maps.

Transcript
Code

Live Lecture 08: Time Series II

Live Lectures

Stationarity and autocorrelation, autoregressive models, moving average models, SARIMA.

Slides
Transcript
Code

Live Lecture 10: Classification II

Live Lectures

Classification metrics, cross-entropy loss, Bayes Based Classifiers (LDA/QDA/NB), Support Vector Machines

Slides
Transcript
Code

Math Hour 11

Math Hour

Optimization basics (Lagrangian duality, KKT conditions) with application to the linear SVC problem.

Transcript
Code

Live Lecture 13: Neural Networks

Live Lectures

Feed Forward Neural Networks, Convolution layers with application to image classification, Recurrent Neural Networks with application to text classification.

Slides
Transcript
Code

Project/Homework Instructions

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Project/Team Formation
Project Submission
Projects README

Schedule

Click on any date for more details

Orientation & Setup Week: May 8 - May 12, 2025
Phase 1 - Instruction and Project Completion: May 13 - Jul 2, 2025
Project Review & Judging: Jul 3 - Jul 9, 2025
Phase 2 - Intense Interview Prep & Career Connections for Certificate Holders: Jul 10 - Aug 15, 2025

Lecture 01: Orientation / Computer Setup Day

May 8, 2025 at 05:30 PM UTC

EVENT

Problem Session 02

May 14, 2025 at 05:00 PM UTC

EVENT

Problem Session 03

May 16, 2025 at 05:00 PM UTC

EVENT

Math Hour 04

May 21, 2025 at 02:00 PM UTC

EVENT

Math Hour 05

May 23, 2025 at 02:00 PM UTC

EVENT

Math Hour 06

May 28, 2025 at 02:00 PM UTC

EVENT

Math Hour 07

May 30, 2025 at 02:00 PM UTC

EVENT

Cancelled: Math Hour 08

Jun 4, 2025 at 02:00 PM UTC

EVENT

Math Hour 09

Jun 6, 2025 at 02:00 PM UTC

EVENT

Math Hour 10

Jun 11, 2025 at 02:00 PM UTC

EVENT

Math Hour 11

Jun 13, 2025 at 02:00 PM UTC

EVENT

Math Hour 12

Jun 18, 2025 at 02:00 PM UTC

EVENT

Math Hour 13

Jun 20, 2025 at 02:00 PM UTC

EVENT

DS Practice Interview, Option A (Link in Slack Channel)

Jul 10, 2025 at 04:00 PM UTC

EVENT

Project Showcase and Commencement

Jul 15, 2025 at 04:00 PM UTC

EVENT

DS Practice Interview, Option B

Jul 18, 2025 at 12:00 AM UTC

EVENT

DS Phase II Office Hour

Jul 23, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option C

Jul 25, 2025 at 01:00 PM UTC

EVENT

DS Practice Interview, Option B

Aug 1, 2025 at 12:00 AM UTC

EVENT

DS Practice Interview, Option A

Aug 7, 2025 at 04:00 PM UTC

EVENT

DS Phase II Office Hour

Aug 13, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option C (Link in Slack Channel)

Aug 15, 2025 at 01:00 PM UTC

EVENT

Lecture 02: Regression I

May 13, 2025 at 05:30 PM UTC

EVENT

Lecture 03: Regression II

May 15, 2025 at 05:30 PM UTC

EVENT

Project Pitch Hour

May 16, 2025 at 08:00 PM UTC

EVENT

Problem Session 04

May 21, 2025 at 05:00 PM UTC

EVENT

Problem Session 05

May 23, 2025 at 05:00 PM UTC

EVENT

Problem Session 06

May 28, 2025 at 05:00 PM UTC

EVENT

Problem Session 07

May 30, 2025 at 05:00 PM UTC

EVENT

Problem Session 08

Jun 4, 2025 at 05:00 PM UTC

EVENT

Problem Session 09

Jun 6, 2025 at 05:00 PM UTC

EVENT

Problem Session 10

Jun 11, 2025 at 05:00 PM UTC

EVENT

Problem Session 11

Jun 13, 2025 at 05:00 PM UTC

EVENT

Problem Session 12

Jun 18, 2025 at 05:00 PM UTC

EVENT

Problem Session 13

Jun 20, 2025 at 05:00 PM UTC

EVENT

DS Practice Interview, Option B (Link in Slack Channel)

Jul 11, 2025 at 12:00 AM UTC

EVENT

DS Phase II Office Hour

Jul 16, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option C

Jul 18, 2025 at 01:00 PM UTC

EVENT

DS Practice Interview, Option A

Jul 24, 2025 at 04:00 PM UTC

EVENT

DS Phase II Office Hour

Jul 30, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option C

Aug 1, 2025 at 01:00 PM UTC

EVENT

DS Practice Interview, Option B

Aug 8, 2025 at 12:00 AM UTC

EVENT

DS Practice Interview, Option A (Link in Slack Channel)

Aug 14, 2025 at 04:00 PM UTC

EVENT

DS Phase II Office Hour

Aug 20, 2025 at 06:00 PM UTC

EVENT

Math Hour 02

May 14, 2025 at 02:00 PM UTC

EVENT

Math Hour 03

May 16, 2025 at 02:00 PM UTC

EVENT

Lecture 04: Regression III

May 20, 2025 at 05:30 PM UTC

EVENT

Lecture 05: Inference I

May 22, 2025 at 05:30 PM UTC

EVENT

Lecture 06: Inference II

May 27, 2025 at 05:30 PM UTC

EVENT

Lecture 07: Time Series I

May 29, 2025 at 05:30 PM UTC

EVENT

Lecture 08: Time Series II

Jun 3, 2025 at 05:30 PM UTC

EVENT

Lecture 09: Classification I

Jun 5, 2025 at 05:30 PM UTC

EVENT

Lecture 10: Classification II

Jun 10, 2025 at 05:30 PM UTC

EVENT

Lecture 11: Ensemble Learning I

Jun 12, 2025 at 05:30 PM UTC

EVENT

Cancelled: Lecture 12 ( Ensemble Learning II )

Jun 17, 2025 at 05:30 PM UTC

EVENT

Lecture 13: Introduction to Neural Networks

Jun 19, 2025 at 05:30 PM UTC

EVENT

DS Technical Interview Prep Overview

Jul 7, 2025 at 08:00 PM UTC

EVENT

DS Practice Interview, Option C (Link in Slack Channel)

Jul 11, 2025 at 01:00 PM UTC

EVENT

DS Practice Interview, Option A

Jul 17, 2025 at 04:00 PM UTC

EVENT

DS Phase II Office Hour

Jul 23, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option B

Jul 25, 2025 at 12:00 AM UTC

EVENT

DS Practice Interview, Option A

Jul 31, 2025 at 04:00 PM UTC

EVENT

DS Phase II Office Hour

Aug 6, 2025 at 06:00 PM UTC

EVENT

DS Practice Interview, Option C

Aug 8, 2025 at 01:00 PM UTC

EVENT

DS Practice Interview, Option B (Link in Slack Channel)

Aug 15, 2025 at 12:00 AM UTC

EVENT

Project/Homework Deadlines

May 8, 2025

03:59 AM UTC

Last chance to switch bootcamps

Email Amalya Lehmann at amalya@erdosinstitute.org if you would like to switch to a different bootcamp.

May 15, 2025

03:59 AM UTC

Watch 3 Previous Top Projects

Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.

May 15, 2025

03:59 AM UTC

Watch video about Project Formation

This should help answer any Q's you may have going into project formation

May 16, 2025

08:00 PM UTC

Project Pitch Hour

Opportunity to meet with other Erdős Fellows and form teams and propose topics.

May 20, 2025

03:59 AM UTC

Last day to defer enrollment to a future cohort

Contact Amalya Lehmann (amalya@erdosinstitute.org) if you would like to unenroll this cohort and defer to a future cohort.

May 20, 2025

03:59 AM UTC

Finalized Teams with Preliminary Project Ideas

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

May 24, 2025

03:59 AM UTC

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

May 31, 2025

03:59 AM UTC

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

Jun 7, 2025

03:59 AM UTC

Written proposal of modeling approach [Checkpoint]

Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Jun 14, 2025

03:59 AM UTC

Preliminary Results

Results with visualizations and/or metrics. List of successes and pitfalls.

Jun 21, 2025

03:59 AM UTC

Clean your repository

Clean up your repository so that an outsider can easily follow your work. Convert notebooks into scripts where possible. Confirm that the whole pipeline from data ingestion all the way to prediction or inference works without fuss.

Jun 28, 2025

03:59 AM UTC

Final Projects Due

Final Projects must be submitted by this deadline in order to receive a certificate of completion.

©2017-2025 by The Erdős Institute.

bottom of page