top of page
Data Science Boot Camp

Spring 2023

May 9, 2023

-

Jun 8, 2023

Application/Registration Deadlines

Mar 16, 2023

-

Academics from Member Institutions/Departments

Mar 16, 2023

-

Academics from Non-Member Institutions paying the $500 membership fee

Jan 16, 2023

-

Academics from Non-Member Institutions applying for Corporate Sponsored Fellowships

Application/Registration Link

Erdős Institute Members

You are registered for this program.

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online twice per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Fall.

Instructional Team

matt_osborne.png

Matthew Osborne, PhD

Head of Boot Camps

Office Hours:

Fridays 11 AM - 12 PM ET, 3 - 4 PM ET

Email:

Preferred Contact:

Slack

Don't hesitate to contact me with any questions or concerns, I'm looking forward to this May's boot camp!

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

TBD

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Those who successfully complete a team project will receive a digital certificate of completion with a sharable URL.

Project Examples

TEAM

Koala

David Wen, Preston Pozderac, Wendson Barbosa

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

Root Insurance Bidding Strategy Challenge: We want to propose a bidding strategy for online ad placements based on customer demographics to increase sales of our car insurance policies while minimizing cost and obtaining at least 400 policies sold per 10,000 customers.

TEAM

Correcting Racial Bias in Measurement of Blood Oxygen Saturation

Rohan Myers, Saad Khalid, woojeong kim, Brooks Miner, Jaychandran Padayasi

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

Fingertip pulse oximeters are the current standard for estimating blood oxygen saturation without a blood draw, both at home and in healthcare settings. However, pulse oximeters overestimate oxygen saturation, often resulting in ‘hidden hypoxemia’: a patient has hypoxemia (dangerously low oxygen saturation), but the oximeter returns a healthy oxygen value. Unfortunately, oximeter overestimation of oxygen saturation is exacerbated for patients with darker skin tones due to light-based oximeter technology. This results in Black patients experiencing hidden hypoxemia at twice the rate of white patients. By combining pulse oximeter readings (SpO2) with additional patient data, we develop improved methods for estimating arterial blood oxygen saturation (SaO2) and identifying Hidden Hypoxemia. The predictions of our models are more accurate than pulse-oximeter readings alone, and remove the systematic racial inequity inherent in the current medical practice of using oximeter readings alone.

TEAM

NLPs

Frank Hidalgo, Joseph Szabo, Christopher Zhang, Sean Perez, Kun Jin

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

Acronym/Abbreviation (short form) disambiguation is one of main challenges when using NLP methods to uderstand medical records. While this topic has long been studied, it is still a work in progress. Current strategies often involve having manually curated datasets of abbreviations and train classifiers. The main problem of that approach is that curated datasets are sparse and don't include all the short forms. In Dec 2020, a paper came out where they created a large dataset of short forms as one of their steps in their pipeline to pre-train models. The goal of our project would be to build upon their short form disambiguation piece and create a tool to disambiguate a medical short form using its context. Example of the usage of our tool: original_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, AB 2. She has no history of adverse reaction to anesthesia." AB could stand for "abortion", "ankle-brachial", "blood group in ABO system", "A, B lines in Kerley lines". disambiguated_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, abortion 2. She has no history of adverse reaction to anesthesia."

TEAM

Lime

Yuchen Luo, Ritika Khurana, Aditya Chander, Taylor Mahler

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

We built a podcast recommendation engine that suggests episodes to a listener based on either a previous episode that they've heard or an episode description that they can input with freeform text entry.

TEAM

Supermassive Black Hole

Anna Brosowsky, Sayantan Khan, Nancy Wang, Ethan Zell, Yili Zhang

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

We built a movie finder app that allows a user to enter some details they remember about a movie (along with some optional filter info on the genre and release year) and then predicts what movie the user is thinking of. To solve this NLP problem, our tool uses an embed-and-rerank model. We have precomputed vectorizations of movie plot information for the approximately 34,000 movies in our dataset.

Our model’s first step is to vectorize the user’s query and do a fast comparison to find the 100 closest plot vectors. Then it reranks these top 100 closest plots, performing a more thorough comparison using a neural network that semantically compares the plot fragments with the original query. Finally, we output the 10 movies which show up at the top of this new ranking.