View Team Project Submissions for the Spring and Fall 2022 Data Science Boot Camps below:
11 results were found.
Elizabeth Campolongo, Ranthony Edmonds, Chaya Norton
Special teams play can significantly impact the outcome of a game in the National Football League (NFL). The rising use of advanced metrics and data analytics in American football can help NFL analysts and coaches better understand what features influence special teams play, which has been relatively limited to date. This project applied Topological Data Analysis (TDA) to develop a metric for quantifying special teams plays. In addition to the features provided by the NFL as part of the 2022 NFL Big Data Bowl Challenge, we engineerd features such as the trajectory of the football and “kicker core distance,” a metric we designed to measure the pressure applied to a kicker, to understand their impact on play results.
Adam Kawash, Moeka Ono, Soumen Deb, Allison Londerée
The DaVinci Team of the Erdős Institute has utilized advances in computer vision technology with the goal to train a machine learning model to classify species of birds. We then applied this model in a prototype app ChickID. In doing so our project addresses two primary goals:
1) Generate an algorithm that could take images of birds to identify the species.
2) Ensure our model could function even using amateur-level images with a high degree of accuracy, to ensure accessibility of identification.
Our product can be applied for both private and public settings to allow for fast and accurate identification.
Supermassive Black Hole
Anna Brosowsky, Sayantan Khan, Nancy Wang, Ethan Zell, Yili Zhang
We built a movie finder app that allows a user to enter some details they remember about a movie (along with some optional filter info on the genre and release year) and then predicts what movie the user is thinking of. To solve this NLP problem, our tool uses an embed-and-rerank model. We have precomputed vectorizations of movie plot information for the approximately 34,000 movies in our dataset.
Our model’s first step is to vectorize the user’s query and do a fast comparison to find the 100 closest plot vectors. Then it reranks these top 100 closest plots, performing a more thorough comparison using a neural network that semantically compares the plot fragments with the original query. Finally, we output the 10 movies which show up at the top of this new ranking.
Bryan Reynolds, Kai Wei, Xiaoyu Liu, Estefany Nunez, Xiaozhou Feng
Our project classifies the artist of a painting and applies image style transfer techniques using convolutional neural networks (CNNs). A dataset containing the works of Vincent van Gogh, Claude Monet, Leonardo da Vinci, Rembrandt, Pablo Picasso, and Salvador Dali was created and cleaned. Five CNN models were trained on the data, resulting in classification accuracy scores ranging from 83-88%. Next, ensemble learning techniques were used to apply a voter algorithm using all five CNN models. The best accuracy score was achieved using a majority voter, which increased the model’s accuracy to ~90%. The style transfer model was created using a software package based on CNN techniques and fine-tuned on one famous painting from each artist.
Two interactive web apps were developed, one for the artist classification model and another for the neural style transfer model:
Matthew Frick, Paul Jreidini, Matthew Heffernan
Timely identification of safety-critical events, such as gunshots, is of great importance to public safety stakeholders. However, existing systems only deliver limited value by not classifying additional urban sounds. We perform classification of environmental sounds to detect safety-critical events, in particular gunshots, and provide information on first-response via siren detection. We also engineer general features for off-line classification tasks and demonstrate how this system can provide value to additional stakeholders in the film and television industry.
Chenyi Gu, Briana Stanfield, Dylan Bates, Kanishk Jain
The NHL Stanley Cup is the oldest existing trophy to be awarded to a professional sports franchise in North America, and often considered “the hardest trophy to win in professional sport.” Using just regular season data, we want to know, can we predict who is going to win the Stanley Cup?
We collected data from each team, as well as data from every player in over 20,000 games going back to 2005. Using this data, we made an ensemble model using logistic regression, AdaBoost, random forests, and a neural network, which were able to predict playoff data with up to 70% accuracy - above the theoretical threshold reported in the literature of 62%.
Christopher Chia, Moeka Ono
Maps of forests allow us to know the locations of a variety of different tree cover types. However, forests change over time, and updating maps involves an expensive process of data collecting.
We answer two questions: Can we instead predict tree cover types just from geographical features?
And can we identify the most essential feature to prioritize when collecting data?
We answer these questions with machine learning algorithms and topological data analysis.
Olivia McAuley, Dylan Bates
Wildfires damage the environment, lives, and property; and cost the US billions of dollars in damage each year.
The goal of our project is to predict where wildfires will spread, providing important information to stakeholders, and ultimately reducing these costs.
Stakeholders could use this information to optimally allocate resources and direct first-responders where to begin fire suppression and evacuation efforts.
Anudeep Arora, Sam Landoulsi, Lalit Yadav
A delayed flight causes major financial losses to airline companies, airports, and travelers. For instance, expenses incurred by a traveler for accommodation and food due to flight delays and/or missed connections together with time lost and being away from home. This can result in a decrease in air travel demand from existing and potential customers for an airline because travelers bank on efficient service quality and performance. Impact of delays can translate into productivity slowdown, indirectly affecting the economy and stunting GDP. Keeping this in mind, our aim is to build a classifier model to answer the following question:
● Given a 4-hour period horizon, will a flight be delayed by more than 15 minutes?
● Our target audience is airline companies, as according to the Federal Aviation Administration, the total flight delays cost is $22 billion yearly.
Balin Fleming, Rouzbeh Modarresi-Yazdi, Akash Banerjee, Ethan Farber, Lauren Keyes, Abdullateef Shodunke
American Sign Language (ASL) is the first language of more than 250,000 people in the US and Canada. Despite the large population of people who use the language, automatic translation of the language is not yet widespread. This is in part due to challenges of obtaining high quality data for the images to be properly translated.
The goal of our project is to achieve high quality translation of ASL using publicly available data and convolutional neural networks to accurately classify images. In the future we hope to be able to recognize video capture of ASL.
We train with a large dataset of about 26,500 images. Our project has far reaching uses in making global communication easier to multiple stakeholders.