top of page
Data Science Boot Camp

Spring 2024

Feb 7, 2024

-

May 1, 2024

Register

You are registered for this program.

Registration Deadlines

Feb 8, 2024

-

All interested participants

-

-

Category

Launch, Core Program, Boot Camp, Projects, Certificates

Overview

The Erdős Institute's signature Data Science Boot Camp has been running since May 2018 thanks to the generous support of our sponsors, members, and partners. Due to its popularity, we now offer our boot camp online three times per year in two different formats: a 1-month long intensive boot camp each May and a semester long version each Spring & Fall.

Organizers, Instructors, and Advisors

matt_osborne.png

Steven Gubkin, PhD

Lead Instructor

Office Hours:

Tu: 11am - 12pm ET, and by appt.

Email:

Preferred Contact:

Slack

Please feel free to message me on Slack with any questions!

matt_osborne.png

Matthew Osborne, PhD

Alumni Advisor

Office Hours:

By appointment only

Email:

Preferred Contact:

Slack

Don't hesitate to contact me with any questions or concerns.

matt_osborne.png

Alec Clott, PhD

Head of Data Science Projects

Office Hours:

Wed. 12-12:30pm EST, and by appt.

Email:

Preferred Contact:

Slack

Participants are welcome to reach out to me via slack or email. I normally work standard EST hours (9am-5pm), but can always find time to meet folks via Zoom too after work. Let me know how I can help!

Objectives

The goal of our Data Science Boot Camp is to provide you with the skills and mentorship necessary to produce a portfolio worthy data science/machine learning project while also providing you with valuable career development support and connecting you with potential employers.

Slack

Slack Channel: #slack-channel

Project Examples

TEAM

Aware NLP Project III

Mohammad Nooranidoost, Baian Liu, Craig Franze, Mustafa Anıl Tokmak, Himanshu Raj, Peter Williams

clear.png
Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

This project involves the investigation and evaluation of different methodologies for retrieval for use in RAG (Retrieval-Augmented Generation) systems. In particular, this project investigates retrieval quality for information downloaded from employee subreddits. We investigated the impacts of using clustering, multi-vector indexing, and multi-querying in advanced retrieval methodologies against baseline naive retrieval.

First Steps/Prerequisites

Participants should have a base-level familiarity with Python. Participants should also be familiar with some basic math concepts. Finally, you will also need to have your laptop or desktop computer set up for the course. If you are new to Python, need a quick math refresher, or if you need help setting up your computer, then please follow the link below.

Program Content

You will find all of the course content below in our GitHub repository. If you see a 404 Error when trying to open this repository, first check that you are signed into your GitHub account and then check with our community manager that you have been added to our repositories. Because our repositories are private, you must first be added before you can access them.

 

Every week has a collection of pre-recorded videos: one for each notebook in the repo.

 

Recordings of Monday's Live Lecture will be posted each week by Tuesday at 10:00am ET.

25231-github-cat-in-a-circle-icon-vector-icon-vector-eps.png
Program Content

Textbook/Notes

Lecture 1: Orientation

Live Lectures

In this video we welcome you to our data science content.

Slides
Transcript
Code

Project/Homework Instructions

Erdős Project Instructions (Spring 2024)

The group project is a time to put everything you’ve learned to the test! You will work with your team to produce a portfolio-worthy project that you can use as a talking point with future employers.

 

General Information

In order to get an Erdős certificate, you must complete a data science project from start to finish.

 

Project Topics

Your project can be anything you would like, as long as you use Python. We want your project to be something you’re passionate about and can really dig into. We understand that open ended projects can be difficult so we’ve provided a few resources:

Possible project list

General advice

Project Database (Past Project Examples)

 

Project Help

There are a number of Project Mentors that will be available for project help! Feel free to chat with them via Slack (#project-help) for advice.

 

Project Expectations

The goal is to complete a data science project that could be presented in a job interview.

 

3 Deliverable Requirements (see more details below)

Have an annotated GitHub repository

Executive summary of your project results and implications

5-min pre-recorded PowerPoint presentation detailing project process from start to finish

 

Timeline

The tasks for each week should be submitted to your Project Mentor before your weekly check-in. Some of the items listed below are more of a rough guideline, depending on your project. Consult your project mentor or Alec if you are unsure.

 

Questions about Project Formation:

Please watch the following video, it should help answer any questions you may have about project formation.

 

Project Pitch Hour:

Each session we will hold a "Project Pitch Hour" for everyone to join via Zoom. See the "Schedule" above. The Project Pitch Hour will be an opportunity for anyone without a team to join on Zoom. Once on Zoom, we will give folks an opportunity to "pitch" a project and see who else is interested. In other words, if you join the Project Pitch Hour be ready to fall in one of the two following camps:

 

1. You have a project idea you want to pitch, and are hoping to get other people to join.

2. You don't have a project idea, but you want to see what others are pitching and are hoping to join someone elses team.

 

If you are in camp (1), your pitch can last from anywhere between 30 seconds to 2-3 minutes. There is no expectation to make a formal presentation or in-depth pitch. Your pitch ould be as simple as "I want to do a project on sports, but don't know the methodology yet!" or "I've already identified the exact dataset I want to use, and a few ideas on methodology but am looking for a team." In other words - as basic or as detailed as you like.

 

If you are in camp (2), come with an open mind and be willing to ask questions! And of course, be willing to go in a direction you might not have considered previously!

Project/Team Formation
Project Submission
Projects README

How To Form Projects

Presentation Tips and Tricks (prerecorded)

This video should show you how to navigate the team formation process on the Erdos website.

Slides
Transcript

Project Pitches and Resources

Project Pitch Hour

This is the recording of the May 1st 4:30 PM Project Pitch Hour.

Jim Schwoebel introduces databoard. If you are interested in generating your own synthetic dataset for your project, then please contact Jim and Roman on slack.

Emiliano Santarnecchi introduces his research and commercialization interests in Neuromodulation and Neurostimulation. He is open for project conversations and guidance in these and related areas. https://gordon.mgh.harvard.edu/research/precision-neuroscience-neuromodulation-program/

Then Erdős Spring 2024 participants pitched their project ideas.

Slides
Transcript
Code

Corporate Sponsored Project: Aware

NLP Project

Jason Morgan, VP of Aware, discusses the problem and possible solutions to get started. March 1, 2024 @ 3pm. Slides button for project description. Code button for dataset.

Transcript

Schedule

Click on any date for more details

Lecture 1: Introduction

February 5, 2024 at 8:00:00 PM

EVENT

Lecture 2: Data Collection

February 12, 2024 at 8:00:00 PM

EVENT

Lecture 3: Regression I

February 19, 2024 at 8:00:00 PM

EVENT

Lecture 4: Regression II

February 26, 2024 at 8:00:00 PM

EVENT

Project Pitch Hour

March 1, 2024 at 9:30:00 PM

EVENT

Problem Solving Session 5

March 7, 2024 at 8:00:00 PM

EVENT

Problem Solving Session 6

March 14, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 7

March 21, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 8

March 28, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 9

April 4, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 10

April 11, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 11

April 18, 2024 at 7:00:00 PM

EVENT

Problem Solving Session 12

April 25, 2024 at 7:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

February 6, 2024 at 3:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

February 13, 2024 at 3:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

February 20, 2024 at 3:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

February 27, 2024 at 3:00:00 PM

EVENT

Lecture 5: Regression III

March 4, 2024 at 8:00:00 PM

EVENT

Lecture 6: Time Series I

March 11, 2024 at 7:00:00 PM

EVENT

Lecture 7: Time Series II

March 18, 2024 at 7:00:00 PM

EVENT

Lecture 8: Classification I

March 25, 2024 at 7:00:00 PM

EVENT

Lecture 9: Classification II

April 1, 2024 at 7:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

April 9, 2024 at 2:00:00 PM

EVENT

Lecture 11: Ensemble Learning II

April 15, 2024 at 7:00:00 PM

EVENT

Lecture 12: Neural Networks

April 22, 2024 at 7:00:00 PM

EVENT

Erdős Spring Final Project Showcase

May 1, 2024 at 4:00:00 PM

EVENT

Problem Solving Session 1

February 8, 2024 at 8:00:00 PM

EVENT

Problem Solving Session 2

February 15, 2024 at 8:00:00 PM

EVENT

Problem Solving Session 3

February 22, 2024 at 8:00:00 PM

EVENT

Problem Solving Session 4

February 29, 2024 at 8:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

March 5, 2024 at 3:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

March 12, 2024 at 2:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

March 19, 2024 at 2:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

March 26, 2024 at 2:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

April 2, 2024 at 2:00:00 PM

EVENT

Lecture 10: Ensemble Learning I

April 10, 2024 at 7:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

April 16, 2024 at 2:00:00 PM

EVENT

Math Hour (at 10) / Office Hour (at 11)

April 23, 2024 at 2:00:00 PM

EVENT

Please check your registration email for program schedule and zoom links.

Project/Homework Deadlines

Feb 20, 2024

4:59 AM

Watch 3 Previous Top Projects

Consult the project database, and watch at least 3 previous top projects from Erdos Alumni.

Mar 1, 2024

4:59 AM

Watch video about Project Formation

This should help answer any Q's you may have going into project formation

Mar 1, 2024

9:30 PM

Project Pitch Hour

Click here for the zoom to join the project pitch hour session, an opportunity to meet with other Erdos Fellows and form teams and propose topics.

Mar 9, 2024

4:59 AM

Submit Team Proposal or Idea to Project Formation Page

If you want to propose a project, or have an idea for a project, submit it by this date.

Mar 12, 2024

3:59 AM

Finalized Teams with Preliminary Project Ideas

Teams need to be finalized by this point. If you proposed or created a project, you must have others in your group. If you did not propose or create a project, you must join an open group.

Mar 19, 2024

3:59 AM

Data gathering and defining stakeholders + KPIs

Find the dataset you will be working with. Describe the dataset and the problem you are looking to solve (1 page max). List the stakeholders of the project and company key performance indicators (KPIs) (bullet points).

Mar 26, 2024

3:59 AM

Data cleaning + preprocessing

Look for missing values and duplicates. Basic data manipulation & preliminary feature engineering.

Apr 2, 2024

3:59 AM

Exploratory data analysis + visualizations [Checkpoint]

Distributions of variables, looking for outliers, etc. Descriptive statistics.

Apr 9, 2024

3:59 AM

Written proposal of modeling approach [Checkpoint]

Test linearity assumptions. Dimensionality reductions (if necessary). Describe your planned modeling approach, based on the exploratory data analysis from the last two weeks (< 1 page, bullet points).

Apr 16, 2024

3:59 AM

Machine learning models or equivalent [Checkpoint]

Results with visualizations and/or metrics. List of successes and pitfalls.

Apr 27, 2024

3:59 AM

Final project due

Please read the submission instructions on the link below.

To access the program content, you must first create an account and member profile and be logged in.

bottom of page