top of page
header.png
Team Skylab: NHL Stanley Cup Predictions

written by

Elizabeth Campolongo

Thursday, February 16, 2023

Congratulations to Team Skylab for being a Top 5 Project of the The Erdős Institute’s Spring 2022 Data Science Boot Camp with their project NHL Stanley Cup Predictions!

 

Composed of Assistant Professor of Mathematics Dylan Bates (Coker University), Physics Ph.D. Candidates Chenyi Gu (University of Tennessee) and Kanishk Jain (Emory University), and Briana Stanfield (Rutgers University, Neuroscience Ph.D. Candidate at the time, now at PsychoGenics), Team Skylab successfully created an NHL Stanley Cup Predictor. In fact, at the end of the boot camp (two weeks before the Stanley Cup match), their model predicted that the Colorado Avalanche would win the 2021-2022 Season Stanley Cup (16.9% likelihood)—and they did! This is particularly notable, as underdogs win in hockey at a much higher rate than in other professional sports, which confounds many models attempting to predict outcomes. In creating their model, Team Skylab took a novel approach to compiling their dataset for predictions. Instead of the game-or team-level statistics models generally rely on, Team Skylab focused on player data for the season. They used player ice time to weight the player stats at each position and fed their dataset into an ensemble model consisting of Logistic Regression, Adaboost, Random Forest, and a TensorFlow Keras Sequential Neural Network. They were able to predict winning teams up to 70% percent of the time.

 

When determining what project to develop over the course of the Data Science Boot Camp, they were hoping for a project that would result in viable predictions through direct application of the content learned in the boot camp on a truncated timeline. Of the ideas they considered, predicting the Stanley Cup winner seemed the most relevant and doable in the time frame. Further motivating the choice, Dylan added “I’m from Canada, and so hockey is sort of running through my veins” and every year his dad would ask him who would win, suggesting that, with his background, he could figure it out. Dylan had, thus, tried to do this previously on his own, and with all the knowledge he had gained from the boot camp and his “brilliant team” he was eager to try again. His teammates were sports fans, though less familiar with hockey. This may have been to their advantage in taking a novel approach looking at the players instead of the game. It also generated a greater interest for the team in watching hockey next season.

 

NHL.com has season-level statistics over decades, but it was not very predictive, so they instead found another dataset composed of game-level statistics. They focused their attention on the 2005-06 season through the (then current) 2021-22 season, recognizing that significant rule changes following the 2004-05 lockout could impact the validity of their results. After scraping the available data, their exploratory data analysis focused on finding predictive features, and they wound up focusing on player statistics. They applied weights to players and were able to update the information in the model to account for instances where players were not in the game, eg. if someone was injured. To the best of their knowledge, this was a novel approach, as most modeling only considered team-level statistics, not player data.

 

Briana noted that they had strong test scores on their model—better than they had seen elsewhere—but they agreed that the really rewarding moment came at the end. Their model predicted that the Colorado Avalanche was going to win, and they were indeed the favorites. Moreover, two weeks later (after the boot camp ended) the Colorado Avalanche actually won the Stanley Cup! “We did it! It worked!”(Dylan). They both were extremely happy with the results of their project and glad to have participated in the boot camp.

Looking forward, Briana is curious if ice time is predictable, as this changes from the regular season to the playoffs. The assumption is that the time spent on ice by “star players” would be increased in the playoff season, and modeling this could give an idea of how much impact a player has on the overall outcome. Dylan is interested in the temporal aspect of the game. In particular, he imagines another direction would be to consider player trends over the season: for instance, starting rough, but improving throughout games and potentially carrying that momentum to the playoffs versus a player getting worse throughout the season. Their model looked at averages over the season and did not incorporate temporal data.

 

However, “[i]t’s hard to imagine going back and improving something and still actually finishing it on time,” Briana said. Though they agreed, “ there was an element of luck that we even did it as well as we did within the amount of time that we had” (Briana), the team credits their success to a lot of hard work and the diversity of their team’s knowledge base. Everyone had a different background, allowing them to delegate aspects of the project based on knowledge and interest. This delegation of tasks was crucial when working on such a truncated timeline, and is reflected in the advice they would give to future boot camp participants:

Dylan emphasized, “lean on your team members and use your team members because they have different experiences than you do and they can bring so much to the table. If I had been trying to do this on my own it would have taken four times as long and probably been far worse…the fact that everyone was able to bring so much made it so much easier.”

 

Briana further encourages all future boot camp participants to “pick something quickly and commit to it,” meet regularly with your team and don’t be afraid to try different models—it’s okay if the first one doesn’t work. “We did test out a lot of different kinds of models,” and dedicated a lot of “time to play around with all the hyperparameters,” but “we didn’t immediately have success on our first try,” so “don’t give up.”

 

Congratulations again to Team Skylab for being a Top 5 Project of The Erdős Institute’s Spring 2022 Data Science Boot Camp!

TEAM

Counting Crossings - Team 2

Jared Able

clear.png

Image analysis: The initial goal is to count, as accurately as possible, the number of times an object crosses over itself (think knots or road systems). This count is a basic measure of the complexity of the object. Objects that don't cross themselves are simple, while objects that cross themselves many times are complicated.

Once we have this count in place, we'll try to reduce the complexity of our objects without changing their underlying structure. In the case of roads, this would result in a road system that accomplishes similar goals while being more easily navigated and less resource-intensive to build.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page