top of page
Data Science Boot Camp

Spring 2023

May 9, 2023

-

Jun 8, 2023

Register

You are registered for this program.

Registration Deadlines

Mar 16, 2023

-

Academics from Member Institutions/Departments

Mar 16, 2023

-

Academics from Non-Member Institutions paying the $500 membership fee

Jan 16, 2023

-

Academics from Non-Member Institutions applying for Corporate Sponsored Fellowships

Category

Launch

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online twice per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Fall.

Organizers and Instructors

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Those who successfully complete a team project will receive a digital certificate of completion with a sharable URL.

Project Examples

TEAM

Groundwater Forecasting

Riti Bahl, Meredith Sargent, Marcos Ortiz, Chelsea Gary, Anireju Dudun

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Groundwater is a critical source of water human survival. A significant percentage of both drinking and crop irrigation water is drawn from groundwater sources through wells. In the US, overuse of groundwater could have major implications for the future and forecasting groundwater can be useful in understanding its impact. Building on historical data for four wells, together with surface water and weather data, in Spokane, WA, we construct and evaluate machine learning models that forecast groundwater levels in the area.

TEAM

Correcting Racial Bias in Measurement of Blood Oxygen Saturation

Rohan Myers, Saad Khalid, woojeong kim, Brooks Miner, Jaychandran Padayasi

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

Fingertip pulse oximeters are the current standard for estimating blood oxygen saturation without a blood draw, both at home and in healthcare settings. However, pulse oximeters overestimate oxygen saturation, often resulting in ‘hidden hypoxemia’: a patient has hypoxemia (dangerously low oxygen saturation), but the oximeter returns a healthy oxygen value. Unfortunately, oximeter overestimation of oxygen saturation is exacerbated for patients with darker skin tones due to light-based oximeter technology. This results in Black patients experiencing hidden hypoxemia at twice the rate of white patients. By combining pulse oximeter readings (SpO2) with additional patient data, we develop improved methods for estimating arterial blood oxygen saturation (SaO2) and identifying Hidden Hypoxemia. The predictions of our models are more accurate than pulse-oximeter readings alone, and remove the systematic racial inequity inherent in the current medical practice of using oximeter readings alone.

First Steps/Prerequisites

Participants should have a base-level familiarity with Python. Participants should also be familiar with some basic math concepts. Finally, you will also need to have your laptop or desktop computer set up for the course. If you are new to Python, need a quick math refresher, or if you need help setting up your computer, then please follow the link below.

Slack

Slack Channel: #slack-channel

Program Content

You will find all of the course content below in our GitHub repository. If you see a 404 Error when trying to open this repository, first check that you are signed into your GitHub account and then check with our community manager that you have been added to our repositories. Because our repositories are private, you must first be added before you can access them. Every lecture in the "lectures" folder of the repository comes with a pre-recorded lecture video which you can find below. Note that these videos are not presented in the order in which they should be viewed. To see the suggested viewing order read the README document for the lectures here, https://github.com/TheErdosInstitute/code-2023/tree/main/lectures. Live Lecture Notebook Schedule --------------------------------- 5/8/2023: No jupyter notebooks covered 5/9/2023: All "data-collection" notebooks 5/10/2023:- supervised-learning/2. A Supervised Learning Framework - supervised-learning/3. Data Splits for Predictive Modeling - supervised-learning/regression/1. Simple Linear Regression - supervised-learning/regression/2. A First Predictive Modeling Project 5/11/2023: - supervised-learning/regression/3. Multiple Linear Regression - supervised-learning/regression/4. Categorical Variables and Interactions - supervised-learning/regression/5. Polynomial Regression and Nonlinear Transformations - cleaning/2. Scaling Data 5/15/2023: - cleaning/3. Basic Pipelines - supervised-learning/4. Bias-Variance Trade-Off - supervised-learning/regression/6. Regularization - supervised-learning/regression/9. Feature Selection Approaches - supervised-learning/regression/8. Linear Regression Diagnostic Plots (if time) 5/16/2023: - supervised-learning/time-series-forecasting/1. What are Time Series and Forecasting - supervised-learning/time-series-forecasting/2. Adjustments for Time Series Data - supervised-learning/time-series-forecasting/4. Baseline Forecasts - supervised-learning/time-series-forecasting/5. Averaging and Smoothing (if time) 5/17/2023: - supervised-learning/time-series-forecasting/5. Averaging and Smoothing (wrap-up) - supervised-learning/time-series-forecasting/6. Stationarity and Autocorrelation - supervised-learning/time-series-forecasting/7. ARIMA - supervised-learning/time-series-forecasting/8. Next Steps for Time Series - Will start on classification content if time permits 5/18/2023: - supervised-learning/classification/2. k Nearest Neighbors Classifier - supervised-learning/classification/3. The Confusion Matrix - supervised-learning/classification/4. Logistic Regression - supervised-learning/classification/5. Diagnostic Curves 5/22/2023: - supervised-learning/classification/6. Bayes' Based Classifiers - supervised-learning/classification/8. Support Vector Machines - unsupervised-learning/dimension-reduction/1. Principal Components Analysis 5/23/2023: - supervised-learning/classification/9. Decision Trees - supervised-learning/ensemble-learning/1. What is Ensemble Learning - supervised-learning/ensemble-learning/2. Random Forests - supervised-learning/ensemble-learning/3. Bagging and Pasting 5/24/2023: - supervised-learning/ensemble-learning/4. Boosting - supervised-learning/ensemble-learning/5. AdaBoost - supervised-learning/ensemble-learning/6. Gradient Boosting - supervised-learning/ensemble-learning/7. XGBoost - supervised-learning/ensemble-learning/8. Voter Models 5/25/2023: - neural-networks/1. Perceptrons - neural-networks/2. The MNIST Data Set - neural-networks/3. Multilayer Neural Networks - neural-networks/4. keras

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Program Content

Textbook/Notes

Welcome!

Introduction

In this video we welcome you to our data science content.

Slides
Code

Project/Homework Instructions

Erdős Project Instructions (Spring 2023)

The group project is a time to put everything you’ve learned to the test! You will work with your team to produce a portfolio-worthy project that you can use as a talking point with future employers.

 

General Information

In order to get an Erdős certificate, you must complete a data science project from start to finish.

 

Project Topics

Your project can be anything you would like, as long as you use Python. We want your project to be something you’re passionate about and can really dig into. We understand that open ended projects can be difficult so we’ve provided a few resources:


Project Help

There are a number of Project Mentors that will be available for project help! Feel free to chat with them via Slack (#project-help) for advice.

 

Project Expectations

The goal is to complete a data science project that could be presented in a job interview.

 

Requirements (see more details below)

  • Have an annotated GitHub repository

  • Executive summary of your project results and implications

  • 5-min pre-recorded PowerPoint presentation detailing project process from start to finish

 

Timeline

The tasks for each week should be submitted to your Project Mentor before your weekly check-in. Some of the items listed below are more of a rough guideline, depending on your project. Consult your project mentor or Alec if you are unsure.

Project/Team Formation
Project Submission
Projects README

Schedule

Click on any date for more details

Matt Osborne Office Hour

May 3, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 1

May 9, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 2

May 10, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 3

May 11, 2023 at 2:00:00 PM

EVENT

Matt Office Hour

May 12, 2023 at 3:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 4

May 15, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 5

May 16, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 6

May 17, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 7

May 18, 2023 at 2:00:00 PM

EVENT

Matt Office Hour

May 19, 2023 at 3:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 8

May 22, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 9

May 23, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 10

May 24, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 11

May 25, 2023 at 8:00:00 PM

EVENT

Matt Office Hour

May 26, 2023 at 7:00:00 PM

EVENT

Matt Office Hour

May 31, 2023 at 6:00:00 PM

EVENT

Erdős Final Project Showcase and Commencement

June 7, 2023 at 4:00:00 PM

EVENT

Matt Osborne Office Hour

May 5, 2023 at 3:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 1

May 9, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 2

May 10, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 3

May 11, 2023 at 8:00:00 PM

EVENT

Matt Office Hour

May 12, 2023 at 7:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 4

May 15, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 5

May 16, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 6

May 17, 2023 at 8:00:00 PM

EVENT

Data Science Boot Camp PM Problem Session 7

May 18, 2023 at 8:00:00 PM

EVENT

Matt Office Hour

May 19, 2023 at 7:00:00 PM

EVENT

Data Science Boot Camp Lecture 9

May 22, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 10

May 23, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 11

May 24, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 12

May 25, 2023 at 9:30:00 PM

EVENT

Matt Office Hour

May 29, 2023 at 8:00:00 PM

EVENT

Matt Office Hour

June 1, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp Lecture 1

May 8, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 2

May 9, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 3

May 10, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 4

May 11, 2023 at 9:30:00 PM

EVENT

Project Pitch Day (Live on Zoom)

May 12, 2023 at 8:30:00 PM

EVENT

Data Science Boot Camp Lecture 5

May 15, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 6

May 16, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 7

May 17, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp Lecture 8

May 18, 2023 at 9:30:00 PM

EVENT

Data Science Boot Camp AM Problem Session 8

May 22, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 9

May 23, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 10

May 24, 2023 at 2:00:00 PM

EVENT

Data Science Boot Camp AM Problem Session 11

May 25, 2023 at 2:00:00 PM

EVENT

Matt Office Hour

May 26, 2023 at 3:00:00 PM

EVENT

Matt Office Hour

May 31, 2023 at 2:00:00 PM

EVENT

Matt Office Hour

June 1, 2023 at 8:00:00 PM

EVENT

Please check your registration email for program schedule and zoom links.

Project/Homework Deadlines

May 12, 2023

8:30 PM

Project Pitch Day (Live on Zoom)

Opportunity to meet with other Erdos Fellows and form teams and propose topics.

May 13, 2023

3:59 AM

Submit Team Proposal to Project Formation Page

If you want to propose a project, or have an idea for a project, submit it by this date.

May 15, 2023

3:59 AM

Finalized Teams with Preliminary Project Idea

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

May 20, 2023

3:59 AM

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

May 20, 2023

3:59 AM

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

May 27, 2023

3:59 AM

Exploratory data analysis + visualizations [Checkpoint]

Distributions of variables, looking for outliers, etc. Descriptive statistics.

May 27, 2023

3:59 AM

Written proposal of modeling approach [Checkpoint]

Test linearity assumptions. Dimensionality reductions (if necessary). Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Jun 2, 2023

3:59 AM

Machine learning models or equivalent [Checkpoint]

Results with visualizations and/or metrics. List of successes and pitfalls.

Jun 3, 2023

4:00 PM

Final project due

Please read the submission instructions on the link below.

To access the program content, you must first create an account and member profile and be logged in.

bottom of page