top of page
Data Science Boot Camp

Fall 2024

Sep 5, 2024

-

Dec 13, 2024

This program is included with Fall 2024 Career Launch Cohort Enrollment and Erdős Institute Alumni Club Membership at no additional cost.
erdosOspin.gif

Checking your registration status...

To access the program content, you must first create an account and member profile and be logged in.

You are registered for this program.

Registration Deadlines

Sep 6, 2024

-

All Erdős Fall 2024 Career Launch Cohort or Alumni Club members who are not participating in the UX Research nor Deep Learning Boot Camps

-

-

Category

Launch, Core Program, Boot Camp, Projects, Certificates

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online three times per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Spring & Fall.

Slack

Click here to be invited to the slack organization: The Erdős Institute

Click here to access the slack cohort channel: #slack-cohort-channel

Click here to access the slack program channel: #slack-program-channel

calendar-icon.png

Click here to download the Events & Deadlines .ics calendar file

Organizers, Instructors, and Advisors

matt_osborne.png

Steven Gubkin, PhD

Lead Instructor

Office Hours:

W 11am - 12pm and by appt.

Email:

Preferred Contact:

Slack

Please feel free to message me on Slack with any questions!

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

By appt. only

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Project Examples

TEAM 16

Predicting Lead Contamination in NY School Drinking Water

Ranadeep Roy,Cami Goray,Hana Lang

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Lead is a toxic metal, and in children especially, lead exposure can have severe health consequences -- even small amounts of lead have the potential to affect memory, behavior, and learning ability. Despite this, numerous schools across New York State have at least one drinking water outlet with lead levels testing for above 5 ppb. In this project, we aim to predict the presence of lead contamination in school drinking water, and better understand the role of demographic, socioeconomic, infrastructural, and geographic features in elevated lead levels.

TEAM 33

Tuning Up Music Highway

James O'Quinn, Yang Mo, john hurtado cadavid, Ruixuan Ding, Chilambwe Natasha Wapamenshi

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Known as the most dangerous highway in Tennessee, Music Highway, the stretch of Interstate 40 between Memphis and Nashville, could use a serious tuning up. This project investigates the effectiveness and cost-efficiency of potential physical safety interventions along its Madison and Henderson County segments, with the goal of reducing crash severity. We used a data-driven geospatial modeling approach to assess whether adding specific safety features to targeted segments predicts statistically significant changes in crash injury outcomes.

First Steps/Prerequisites

Computer Setup Day/First Steps
There are some computer set up steps you need to complete before the first lecture. We will meet on 09/05/2024 on Zoom to make sure that we have all done the following:
  1. Cloned the GitHub repo locally
  2. Installed the conda environment.
  3. Run a Jupyter Notebook using that conda environment.
Detailed instructions (created by teaching assistant Ness Mayker Chen) can be found at this link.
 
We will test your ability to do these things by having you submit a "secret code". You will obtain this code by successfully running the notebook
 
computer_setup_day/find_secret_code.ipynb
 
When you have obtained the code put it in the textbox at https://www.erdosinstitute.org/ds-boot-camp-prep
 
If you can do these things independently please show up to help your colleagues!
If you cannot do these things independently please show up to get help from your colleagues!
 
Prerequisites
 
In addition to these computer setup steps there are also some content prerequisites:
  1. Base level familiarity with Python
  2. Differential calculus. Ideally you also know some multivariate differential calculus and linear algebra.
  3. Basic statistics and probability

Program Content

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Course materials are available on github through the following link:

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Request Access to GitHub

github message for user

Program Content

Textbook/Notes

Note: our video player does not support playback speed options. You can find a third party browser extension which will allow you to modify video playback speed. For example, this one works for Chrome: video-speed-controller. If you would prefer to avoid a browser extension you can manually modify the playback speed in the javascript console as well: Speed up any HTML5 video player!

Live Lecture 12: Neural Networks

Live Lectures

What a Feed Forward Neural Network is, how Convolution layers work, how simple Recurrent Neural Networks work.

Slides
Transcript
Code

Organizing your work

Technical Support

We present two options for organizing the work you do in this course:

1. Copying the notebooks are working on to a new folder.
2. Creating a local branch of the repo where you do your work.

Transcript
Code

Web Scraping with BeautifulSoup

Data Collection (prerecorded)

We give a brief introduction into web scraping with BeautifulSoup

Slides
Code

Summary and Conclusion

Data Collection (prerecorded)

We sure have learned a lot of ways to collect data with python. Let's summarize and make some final conclusions on this topic.

Slides
Code

Math Hour 12

Math Hour

We sketch a constructive proof of a Universal Approximation Theorem for 2 hidden layer ReLU networks. We also briefly indicate how to get a similar result non-constructively using only 1 hidden layer

Slides
Transcript
Code

A Broad Overview

Data Collection (prerecorded)

In this video we give an eagle's eye view of what we will cover in our data science content.

Slides
Code

Python and APIs

Data Collection (prerecorded)

How can we use python to collect data from APIs?

Slides

A Supervised Learning Framework

02 Week: Regression I (prerecorded)

We introduce a statistical framework for supervised learning problems.

Slides
Code

How to clone the GitHub Repo

Technical Support

This video will walk you through cloning the GitHub repo. It also addresses how to troubleshoot some common pitfalls.

Transcript
Code

Data Source Websites

Data Collection (prerecorded)

We cover a plethora of data source websites you can use.

Slides
Code

Data in Databases

Data Collection (prerecorded)

Your data is stuck in a database, can you get it out? Learn how in this video.

Slides
Code

Data Splits

02 Week: Regression I (prerecorded)

We introduce data splits including:
1. Train/Test splits
2. Validation sets
3. k-fold cross validation.

Slides
Code

Project/Homework Instructions

I'm a paragraph. Click here to add your own text and edit me. It's easy.

Project/Team Formation
Project Submission
Projects README

How To Form Projects

Presentation Tips and Tricks (prerecorded)

This video should show you how to navigate the team formation process on the Erdos website.

Slides
Transcript

F24 Project Pitch Hour

Project Pitch Hour

Participants pitch their project ideas and then get into breakout rooms to discuss the ideas they are interested in.

Slides
Transcript
Code

Schedule

Click on any date for more details

Phase 1 - Instruction and Project Completion
Project Review & Judging
Phase 2 - Intense Interview Prep & Career Connections

DS Bootcamp computer setup day

Sep 5, 2024 at 06:00 PM UTC

EVENT

Office Hour 1

Sep 11, 2024 at 03:00 PM UTC

EVENT

Math Hour 2

Sep 18, 2024 at 02:00 PM UTC

EVENT

Lecture 3: Regression II

Sep 24, 2024 at 04:00 PM UTC

EVENT

Problem Session 3

Sep 26, 2024 at 06:00 PM UTC

EVENT

Math Hour 4

Oct 2, 2024 at 02:00 PM UTC

EVENT

Lecture 5: Inference I

Oct 8, 2024 at 04:00 PM UTC

EVENT

Problem Session 5

Oct 10, 2024 at 06:00 PM UTC

EVENT

Office Hour 6

Oct 16, 2024 at 03:00 PM UTC

EVENT

Problem Session 7

Oct 24, 2024 at 06:00 PM UTC

EVENT

Math Hour 8

Oct 30, 2024 at 02:00 PM UTC

EVENT

Lecture 9: Classification II

Nov 5, 2024 at 05:00 PM UTC

EVENT

Problem Session 9

Nov 7, 2024 at 07:00 PM UTC

EVENT

Problem Session 10

Nov 14, 2024 at 07:00 PM UTC

EVENT

Math Hour 11

Nov 20, 2024 at 03:00 PM UTC

EVENT

Lecture 12: Introduction to Neural Networks

Nov 26, 2024 at 05:00 PM UTC

EVENT

Commencement and Project Showcase

Dec 11, 2024 at 05:00 PM UTC

EVENT

Lecture 1: Introduction, Computer Setup, Q/A

Sep 10, 2024 at 04:00 PM UTC

EVENT

Problem Session 1

Sep 12, 2024 at 06:00 PM UTC

EVENT

Office Hour 2

Sep 18, 2024 at 03:00 PM UTC

EVENT

Math Hour 3

Sep 25, 2024 at 02:00 PM UTC

EVENT

Project Pitch Hour

Sep 30, 2024 at 08:30 PM UTC

EVENT

Office Hour 4

Oct 2, 2024 at 03:00 PM UTC

EVENT

Math Hour 5

Oct 9, 2024 at 02:00 PM UTC

EVENT

Lecture 6: Inference II

Oct 15, 2024 at 04:00 PM UTC

EVENT

Problem Session 6

Oct 17, 2024 at 06:00 PM UTC

EVENT

Office Hour 7

Oct 25, 2024 at 03:00 PM UTC

EVENT

Office Hour 8

Oct 30, 2024 at 03:00 PM UTC

EVENT

Math Hour 9

Nov 6, 2024 at 03:00 PM UTC

EVENT

Lecture 10: Ensemble Learning I

Nov 12, 2024 at 05:00 PM UTC

EVENT

Math Hour 10

Nov 15, 2024 at 03:00 PM UTC

EVENT

Office Hour 11

Nov 20, 2024 at 04:00 PM UTC

EVENT

Math Hour 12

Nov 27, 2024 at 03:00 PM UTC

EVENT

Math Hour 1

Sep 11, 2024 at 02:00 PM UTC

EVENT

Lecture 2: Regression I

Sep 17, 2024 at 04:00 PM UTC

EVENT

Problem Session 2

Sep 19, 2024 at 06:00 PM UTC

EVENT

Office Hour 3

Sep 25, 2024 at 03:00 PM UTC

EVENT

Lecture 4: Regression III

Oct 1, 2024 at 04:00 PM UTC

EVENT

Problem Session 4

Oct 3, 2024 at 06:00 PM UTC

EVENT

Office Hour 5

Oct 9, 2024 at 03:00 PM UTC

EVENT

Math Hour 6

Oct 16, 2024 at 02:00 PM UTC

EVENT

Lecture 7: Time Series

Oct 22, 2024 at 04:00 PM UTC

EVENT

Lecture 8: Classification I

Oct 29, 2024 at 04:00 PM UTC

EVENT

Problem Session 8

Oct 31, 2024 at 06:00 PM UTC

EVENT

Office Hour 9

Nov 6, 2024 at 04:00 PM UTC

EVENT

Office Hour 10

Nov 13, 2024 at 04:00 PM UTC

EVENT

Lecture 11: Ensemble Learning II

Nov 19, 2024 at 05:00 PM UTC

EVENT

Problem Session 11

Nov 21, 2024 at 07:00 PM UTC

EVENT

Office Hour 12

Nov 27, 2024 at 04:00 PM UTC

EVENT

Project/Homework Deadlines

Sep 21, 2024

03:59 AM UTC

Watch video about Project Formation

This should help answer any Q's you may have going into project formation

Sep 21, 2024

03:59 AM UTC

Watch 3 Previous Top Projects

Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.

Sep 30, 2024

08:30 PM UTC

Project Pitch Hour

Opportunity to meet with other Erdos Fellows and form teams and propose topics.

Oct 5, 2024

03:59 AM UTC

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

Oct 5, 2024

03:59 AM UTC

Finalized Teams with Preliminary Project Ideas

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

Oct 19, 2024

03:59 AM UTC

Exploratory data analysis + visualizations [Checkpoint]

Distributions of variables, looking for outliers, etc. Descriptive statistics.

Oct 19, 2024

03:59 AM UTC

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

Nov 2, 2024

03:59 AM UTC

Written proposal of modeling approach [Checkpoint]

Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Nov 9, 2024

04:59 AM UTC

Machine learning models or equivalent [Checkpoint]

Results with visualizations and/or metrics. List of successes and pitfalls.

Dec 3, 2024

04:59 AM UTC

Final Projects Due

Final Projects must be submitted by this deadline in order to receive a certificate of completion.

©2017-2025 by The Erdős Institute.

bottom of page