top of page
Data Science Boot Camp

Fall 2023

Sep 7, 2023

-

Dec 7, 2023

I'm a paragraph. Click here to add your own text and edit me. It's easy.

erdosOspin.gif

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Registration Deadlines

Oct 1, 2023

-

Academics from Member Institutions/Departments

Sep 30, 2023

-

Academics from Non-Member Institutions paying the $500 cohort membership fee

Sep 30, 2023

-

Academics from Non-Member Institutions applying for Corporate Sponsored Fellowships

Category

Launch

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online twice per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Fall.

Slack

Click here to be invited to the slack organization: The Erdős Institute

Click here to access the slack cohort channel: #slack-cohort-channel

Click here to access the slack program channel: #slack-program-channel

calendar-icon.png

Click here to download the Events & Deadlines .ics calendar file

Organizers, Instructors, and Advisors

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

Wed. 12-12:30pm EST, and by appt.

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!

matt_osborne.png

Matthew Osborne, PhD

Lead Instructor, Senior Operations Analyst

Office Hours:

By appointment only

Email:

Preferred Contact:

Slack

Don't hesitate to contact me with any questions or concerns.

matt_osborne.png

Steven Gubkin, PhD

Lead Teaching Assistant

Office Hours:

Tu: 11am - 12pm ET

Email:

Preferred Contact:

Slack

Please feel free to message me on Slack with any questions! I will also be running a “math hour” every Wednesday from 11am - 12pm ET which will explore the mathematical underpinnings of the techniques covered in the previous lecture.

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Project Examples

TEAM 33

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

TEAM 29

Who Regulates the Regulators?

Jared Able, Joshua Jackson, Zachary Brennan, Alexandria Wheeler, Nicholas Geiser

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

With recent major cuts to governmental regulation agencies in the US, we investigate whether those cuts are justified. In particular, we analyze the efficacy of RGGI, a state-level cap-and-trade program designed to regulate CO2 emissions in power plants. By using synthetic controls, we answer the counterfactual question: "how would CO2 emissions look in a world where RGGI was never enacted?".

First Steps/Prerequisites

Participants should have a base-level familiarity with Python. Participants should also be familiar with some basic math concepts. Finally, you will also need to have your laptop or desktop computer set up for the course. If you are new to Python, need a quick math refresher, or if you need help setting up your computer, then please follow the link below.

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Course materials are available on github through the following link:

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Request Access to GitHub

github message for user

Program Content

Textbook/Notes

Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!

Introduction

Cleaning

An introduction to our Cleaning section of notebooks.

Slides
Code

Imputation

Cleaning

When you are missing data, try imputing!

Slides
Code

Plotting Tips

Presentation Tips and Tricks

Some tips on making good presentation plots from Matt.

Slides
Code

Fall 2023 Live Lecture 3 - Supervised Learning Intro & Regression I

Fall 2023 Live Lecture

We introduce the concept of supervised learning and begin our regression unit.

Slides
Code

Scaling Data

Cleaning

Sometimes you have data with different scales, here we show you how to change that.

Slides
Code

More Advanced Pipelines

Cleaning

We demonstrate some more advanced pipeline techniques in sklearn.

Slides
Code

Fall 2023 Live Lecture 1 - Introduction

Fall 2023 Live Lecture

The recording of our first live lecture from Fall 2023

Code

Fall 2023 Live Lecture 4 - Regression II

Fall 2023 Live Lecture

We continue our linear regression content and take a detour into data cleaning.

Slides
Code

Basic Pipelines

Cleaning

Pipelines are a nice way to put all modeling steps into one neat package. Here we introduce the most basic pipeline creation methods.

Slides
Code

General Presentation Tips

Presentation Tips and Tricks

Some general tips for making a good presentation from Matt.

Slides
Code

Fall 2023 Live Lecture 2 - Data Collection

Fall 2023 Live Lecture

The recording of our second live lecture from Fall 2023. We discuss data collection methods.

Slides
Code

Fall 2023 Live Lecture 5 - Regression III

Fall 2023 Live Lecture

We wrap up our linear regression content.

Slides
Code

Project/Homework Instructions

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Project/Team Formation
Project Submission
Projects README

How To Form Projects

Instructional

This video should show you how to navigate the team formation process on the Erdos website.

Slides
Transcript

Schedule

Click on any date for more details

Phase 1: Instruction and Project Completion

Project Review & Judging

Phase 2: Intense Interview Prep & Career Connections

Lecture 1: Introduction

Sep 7, 2023 at 09:30 PM UTC

EVENT

Office Hours 1

Sep 12, 2023 at 03:00 PM UTC

EVENT

Problem Session 2

Sep 18, 2023 at 03:00 PM UTC

EVENT

Lecture 3: Supervised Learning and Regression I

Sep 21, 2023 at 09:30 PM UTC

EVENT

Math Hour 3

Sep 27, 2023 at 03:00 PM UTC

EVENT

Office Hours 4

Oct 3, 2023 at 03:00 PM UTC

EVENT

Problem Session 5

Oct 9, 2023 at 03:00 PM UTC

EVENT

Lecture 6: Time Series II

Oct 12, 2023 at 09:30 PM UTC

EVENT

Math Hour 6

Oct 18, 2023 at 03:00 PM UTC

EVENT

Office Hours 7

Oct 24, 2023 at 03:00 PM UTC

EVENT

Problem Session 8

Oct 30, 2023 at 03:00 PM UTC

EVENT

Lecture 9: Classification II

Nov 2, 2023 at 09:30 PM UTC

EVENT

Math Hour 9

Nov 8, 2023 at 04:00 PM UTC

EVENT

Lecture 11: Ensemble Learning

Nov 16, 2023 at 10:30 PM UTC

EVENT

Math Hour 11

Nov 29, 2023 at 04:00 PM UTC

EVENT

Problem Session 1

Sep 11, 2023 at 03:00 PM UTC

EVENT

Math Hour 1

Sep 13, 2023 at 03:00 PM UTC

EVENT

Office Hours 2

Sep 19, 2023 at 03:00 PM UTC

EVENT

Problem Session 3

Sep 25, 2023 at 03:00 PM UTC

EVENT

Lecture 4: Regression II

Sep 28, 2023 at 09:30 PM UTC

EVENT

Math Hour 4

Oct 4, 2023 at 03:00 PM UTC

EVENT

Office Hours 5

Oct 10, 2023 at 03:00 PM UTC

EVENT

Problem Session 6

Oct 16, 2023 at 03:00 PM UTC

EVENT

Lecture 7: Time Series II

Oct 19, 2023 at 09:30 PM UTC

EVENT

Math Hour 7

Oct 25, 2023 at 03:00 PM UTC

EVENT

Office Hours 8

Oct 31, 2023 at 03:00 PM UTC

EVENT

Problem Session 9

Nov 6, 2023 at 04:00 PM UTC

EVENT

Lecture 10: Classification III

Nov 9, 2023 at 10:30 PM UTC

EVENT

Problem Session 11

Nov 27, 2023 at 04:00 PM UTC

EVENT

Lecture 12: Neural Networks

Nov 30, 2023 at 10:30 PM UTC

EVENT

Study Group

Sep 12, 2023 at 03:00 PM UTC

EVENT

Lecture 2: Data Collection

Sep 14, 2023 at 09:30 PM UTC

EVENT

Math Hour 2

Sep 20, 2023 at 03:00 PM UTC

EVENT

Office Hours 3

Sep 26, 2023 at 03:00 PM UTC

EVENT

Problem Session 4

Oct 2, 2023 at 03:00 PM UTC

EVENT

Lecture 5: Regression III & Time Series I

Oct 5, 2023 at 09:30 PM UTC

EVENT

Math Hour 5

Oct 11, 2023 at 03:00 PM UTC

EVENT

Office Hours 6

Oct 17, 2023 at 03:00 PM UTC

EVENT

Problem Session 7

Oct 23, 2023 at 03:00 PM UTC

EVENT

Lecture 8: Classification I

Oct 26, 2023 at 09:30 PM UTC

EVENT

Math Hour 8

Nov 1, 2023 at 03:00 PM UTC

EVENT

Office Hours 9

Nov 7, 2023 at 04:00 PM UTC

EVENT

Problem Session 10

Nov 13, 2023 at 04:00 PM UTC

EVENT

Office Hours 11

Nov 28, 2023 at 04:00 PM UTC

EVENT

Project Showcase and Commencement

Dec 7, 2023 at 04:55 PM UTC

EVENT

Project/Homework Deadlines

Sep 23, 2023

03:59 AM UTC

Watch 3 Previous Top Projects

Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.

Oct 6, 2023

03:59 AM UTC

Watch video about Project Formation

This should help answer any Q's you may have going into project formation

Oct 6, 2023

08:30 PM UTC

Project Pitch Hour

Opportunity to meet with other Erdos Fellows and form teams and propose topics.

Oct 12, 2023

03:59 AM UTC

Submit Team Proposal or Idea to Project Formation Page

If you want to propose a project, or have an idea for a project, submit it by this date.

Oct 14, 2023

03:59 AM UTC

Finalized Teams with Preliminary Project Ideas

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

Oct 21, 2023

03:59 AM UTC

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

Oct 28, 2023

03:59 AM UTC

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

Nov 4, 2023

03:59 AM UTC

Exploratory data analysis + visualizations [Checkpoint]

Distributions of variables, looking for outliers, etc. Descriptive statistics.

Nov 10, 2023

04:59 PM UTC

Written proposal of modeling approach [Checkpoint]

Test linearity assumptions. Dimensionality reductions (if necessary). Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Nov 16, 2023

04:59 AM UTC

Machine learning models or equivalent [Checkpoint]

Results with visualizations and/or metrics. List of successes and pitfalls.

Dec 2, 2023

04:59 AM UTC

Final project due

Please read the submission instructions on the link below.

©2017-2025 by The Erdős Institute.

bottom of page