top of page
Data Science Boot Camp

Spring 2024

Feb 7, 2024

-

May 1, 2024

I'm a paragraph. Click here to add your own text and edit me. It's easy.

erdosOspin.gif

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Lecture 1: Introduction

Next Event

NEXT EVENT

Registration Deadlines

Feb 8, 2024

-

All interested participants

-

-

Category

Launch, Core Program, Boot Camp, Projects, Certificates

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online three times per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Spring & Fall.

Slack

Click here to be invited to the slack organization: The Erdős Institute

Click here to access the slack cohort channel: #slack-cohort-channel

Click here to access the slack program channel: #slack-program-channel

calendar-icon.png

Click here to download the Events & Deadlines .ics calendar file

Organizers, Instructors, and Advisors

matt_osborne.png

Steven Gubkin, PhD

Lead Instructor

Office Hours:

Tu: 11am - 12pm ET, and by appt.

Email:

Preferred Contact:

Slack

Please feel free to message me on Slack with any questions!

matt_osborne.png

Matthew Osborne, PhD

Alumni Advisor

Office Hours:

By appointment only

Email:

Preferred Contact:

Slack

Don't hesitate to contact me with any questions or concerns.

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

Wed. 12-12:30pm EST, and by appt.

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Project Examples

TEAM 16

Predicting Lead Contamination in NY School Drinking Water

Ranadeep Roy,Cami Goray,Hana Lang

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.

TEAM 33

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

First Steps/Prerequisites

Participants should have a base-level familiarity with Python. Participants should also be familiar with some basic math concepts. Finally, you will also need to have your laptop or desktop computer set up for the course. If you are new to Python, need a quick math refresher, or if you need help setting up your computer, then please follow the link below.

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Course materials are available on github through the following link:

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Request Access to GitHub

github message for user

Program Content

Textbook/Notes

Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!

Math Hour 6

Math Hour

We discuss how the impulse response of an AR(p) process is governed by the roots of the characteristic polynomial.

Slides
Transcript
Code

Lecture 8: Classification I

Live Lectures

We discuss stratified data splits, the kNN classification algorithm, logistic regression, the confusion matrix, and diagnostic curves.

Slides
Transcript
Code

Math Hour 10

Math Hour

We prove a version of Mercer's theorem for finite spaces and make some progress on an infinite generalization.

Slides
Transcript
Code

Lecture 12: Neural Networks

Live Lectures

Feed Forward Neural Networks
Convolution Layers for image data
Recurrent Neural Networks for sequential data
How to save and load model checkpoints.

Slides
Transcript
Code

Lecture 7: Time Series II

Live Lectures

We discuss stationarity, autocorrelation, and all of the components of SARIMA: seasonal autoregressive integrated moving average models.

Slides
Transcript
Code

Math Hour 9

Math Hour

We discuss LDA and QDA.

Slides
Transcript
Code

Lecture 10: Ensembles I

Live Lectures

We discuss decision trees, random forest classifiers, some new regression algorithms which are related to some of our classification techniques, and bagging/pasting.

Slides
Transcript
Code

Welcome!

01 Week Fall 2023: Introduction (prerecorded)

In this video we welcome you to our data science content.

Slides
Code

Math Hour 7

Math Hour

We discuss how to fit logistic regression models.

Slides
Transcript
Code

Lecture 9: Classification II

Live Lectures

We discuss Bayes' theorem based classifiers (LDA, QDA, GNB) and support vector machines. We also motivate the cross entropy loss function.

Slides
Transcript
Code

Lecture 11: Ensembles II

Live Lectures

Voter models, AdaBoost, Gradient Boosting, and XGBoost.

Slides
Transcript
Code

A Broad Overview

02 Week: Data Collection (prerecorded)

In this video we give an eagle's eye view of what we will cover in our data science content.

Slides
Code

Project/Homework Instructions

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Project/Team Formation
Project Submission
Projects README

How To Form Projects

Presentation Tips and Tricks (prerecorded)

This video should show you how to navigate the team formation process on the Erdos website.

Slides
Transcript

Project Pitches and Resources

Project Pitch Hour

This is the recording of the May 1st 4:30 PM Project Pitch Hour.

Jim Schwoebel introduces databoard. If you are interested in generating your own synthetic dataset for your project, then please contact Jim and Roman on slack.

Emiliano Santarnecchi introduces his research and commercialization interests in Neuromodulation and Neurostimulation. He is open for project conversations and guidance in these and related areas. https://gordon.mgh.harvard.edu/research/precision-neuroscience-neuromodulation-program/

Then Erdős Spring 2024 participants pitched their project ideas.

Slides
Transcript
Code

Corporate Sponsored Project: Aware

NLP Project

Jason Morgan, VP of Aware, discusses the problem and possible solutions to get started. March 1, 2024 @ 3pm. Slides button for project description. Code button for dataset.

Transcript

Schedule

Click on any date for more details

Orientation & Setup

Phase 1: Instruction and Project Completion

Project Review & Judging

Phase 2: Intense Interview Prep & Career Connections

Lecture 1: Introduction

Feb 5, 2024 at 08:00 PM UTC

EVENT

Lecture 2: Data Collection

Feb 12, 2024 at 08:00 PM UTC

EVENT

Lecture 3: Regression I

Feb 19, 2024 at 08:00 PM UTC

EVENT

Lecture 4: Regression II

Feb 26, 2024 at 08:00 PM UTC

EVENT

Project Pitch Hour

Mar 1, 2024 at 09:30 PM UTC

EVENT

Problem Solving Session 5

Mar 7, 2024 at 08:00 PM UTC

EVENT

Problem Solving Session 6

Mar 14, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 7

Mar 21, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 8

Mar 28, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 9

Apr 4, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 10

Apr 11, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 11

Apr 18, 2024 at 07:00 PM UTC

EVENT

Problem Solving Session 12

Apr 25, 2024 at 07:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Feb 6, 2024 at 03:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Feb 13, 2024 at 03:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Feb 20, 2024 at 03:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Feb 27, 2024 at 03:00 PM UTC

EVENT

Lecture 5: Regression III

Mar 4, 2024 at 08:00 PM UTC

EVENT

Lecture 6: Time Series I

Mar 11, 2024 at 07:00 PM UTC

EVENT

Lecture 7: Time Series II

Mar 18, 2024 at 07:00 PM UTC

EVENT

Lecture 8: Classification I

Mar 25, 2024 at 07:00 PM UTC

EVENT

Lecture 9: Classification II

Apr 1, 2024 at 07:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Apr 9, 2024 at 02:00 PM UTC

EVENT

Lecture 11: Ensemble Learning II

Apr 15, 2024 at 07:00 PM UTC

EVENT

Lecture 12: Neural Networks

Apr 22, 2024 at 07:00 PM UTC

EVENT

Erdős Spring Final Project Showcase

May 1, 2024 at 04:00 PM UTC

EVENT

Problem Solving Session 1

Feb 8, 2024 at 08:00 PM UTC

EVENT

Problem Solving Session 2

Feb 15, 2024 at 08:00 PM UTC

EVENT

Problem Solving Session 3

Feb 22, 2024 at 08:00 PM UTC

EVENT

Problem Solving Session 4

Feb 29, 2024 at 08:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Mar 5, 2024 at 03:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Mar 12, 2024 at 02:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Mar 19, 2024 at 02:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Mar 26, 2024 at 02:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Apr 2, 2024 at 02:00 PM UTC

EVENT

Lecture 10: Ensemble Learning I

Apr 10, 2024 at 07:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Apr 16, 2024 at 02:00 PM UTC

EVENT

Math Hour (at 10) / Office Hour (at 11)

Apr 23, 2024 at 02:00 PM UTC

EVENT

Project/Homework Deadlines

Feb 20, 2024

04:59 AM UTC

Watch 3 Previous Top Projects

Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.

Mar 1, 2024

04:59 AM UTC

Watch video about Project Formation

This should help answer any Q's you may have going into project formation

Mar 1, 2024

09:30 PM UTC

Project Pitch Hour

Click here for the zoom to join the project pitch hour session, an opportunity to meet with other Erdos Fellows and form teams and propose topics.

Mar 9, 2024

04:59 AM UTC

Submit Team Proposal or Idea to Project Formation Page

If you want to propose a project, or have an idea for a project, submit it by this date.

Mar 12, 2024

03:59 AM UTC

Finalized Teams with Preliminary Project Ideas

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

Mar 19, 2024

03:59 AM UTC

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

Mar 26, 2024

03:59 AM UTC

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

Apr 2, 2024

03:59 AM UTC

Exploratory data analysis + visualizations [Checkpoint]

Distributions of variables, looking for outliers, etc. Descriptive statistics.

Apr 9, 2024

03:59 AM UTC

Written proposal of modeling approach [Checkpoint]

Test linearity assumptions. Dimensionality reductions (if necessary). Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Apr 16, 2024

03:59 AM UTC

Machine learning models or equivalent [Checkpoint]

Results with visualizations and/or metrics. List of successes and pitfalls.

Apr 27, 2024

03:59 AM UTC

Final project due

Please read the submission instructions on the link below.

©2017-2025 by The Erdős Institute.

bottom of page