top of page
Data Science Boot Camp

Spring 2023

May 9, 2023

-

Jun 8, 2023

Application/Registration Deadlines

Mar 16, 2023

-

Academics from Member Institutions/Departments

Mar 16, 2023

-

Academics from Non-Member Institutions paying the $500 membership fee

Jan 16, 2023

-

Academics from Non-Member Institutions applying for Corporate Sponsored Fellowships

Application/Registration Link

Erdős Institute Members
General Public

You are registered for this program.

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online twice per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Fall.

Instructional Team

matt_osborne.png

Matthew Osborne, PhD

Head of Boot Camps

Office Hours:

TBD

Email:

Preferred Contact:

Slack

Don't hesitate to contact me with any questions or concerns, I'm looking forward to this May's boot camp!

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

TBD

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Those who successfully complete a team project will receive a digital certificate of completion with a sharable URL.

Project Examples

TEAM

Lime

Yuchen Luo, Ritika Khurana, Aditya Chander, Taylor Mahler

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

We built a podcast recommendation engine that suggests episodes to a listener based on either a previous episode that they've heard or an episode description that they can input with freeform text entry.

TEAM

Supermassive Black Hole

Anna Brosowsky, Sayantan Khan, Nancy Wang, Ethan Zell, Yili Zhang

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
clear.png

We built a movie finder app that allows a user to enter some details they remember about a movie (along with some optional filter info on the genre and release year) and then predicts what movie the user is thinking of. To solve this NLP problem, our tool uses an embed-and-rerank model. We have precomputed vectorizations of movie plot information for the approximately 34,000 movies in our dataset.

Our model’s first step is to vectorize the user’s query and do a fast comparison to find the 100 closest plot vectors. Then it reranks these top 100 closest plots, performing a more thorough comparison using a neural network that semantically compares the plot fragments with the original query. Finally, we output the 10 movies which show up at the top of this new ranking.

First Steps/Prerequisites

Participants should have a base-level familiarity with Python. Participants should also be familiar with some basic math concepts. Finally, you will also need to have your laptop or desktop computer set up for the course.

If you are new to Python, need a quick math refresher, or if you need help setting up your computer, then please follow the link below.

To access the program schedule and content, you must first create an account and member profile and be logged in.

Program Content

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png

Program Content

Textbook/Notes

Welcome!

Introduction

In this video we welcome you to our data science content.

Slides
Transcript
Code

Data Repositories

Data Collection

One source of data are public data repository sites. In this video we explain what those are and show a few examples.

Slides
Transcript
Code

Web Scraping with BeautifulSoup I

Data Collection

In part one of a two part series we uncover the hidden secrets of the world wide web with BeautifulSoup. The broth of the internet is HTML.

Slides
Transcript
Code

A Broad Overview

Introduction

In this video we give an eagle's eye view of what we will cover in our data science content.

Slides
Transcript
Code

Data Competition Sites

Data Collection

Data competition sites can be another source of data sets. In this video we discuss such sites and demonstrate pulling a data set from one.

Slides
Transcript
Code

Web Scraping with BeautifulSoup II

Data Collection

Soups on! In part two of this series we wrap up web scraping with python.

Slides
Transcript
Code

Data File Types

Data Collection

We quickly review some of the most common data file types you will encounter working on a data based project. We then show you how to load such data using python.

Slides
Transcript
Code

Data in Databases

Data Collection

Your data is stuck in a database, can you get it out? Learn how in this video.

Slides
Transcript
Code

Python and APIs

Data Collection

Let's cover that API in some Python wrapping paper.

Slides
Transcript
Code

Project/Homework Instructions

Projects README

Schedule

Click any date for more details

Please check your registration email for program schedule and zoom links.

Project/Homework Deadlines

bottom of page