top of page
Team Mahogany: Wildfire Spread Prediction

written by

Elizabeth Campolongo

Thursday, February 16, 2023

Congratulations to Team Mahogany for being a Top 5 Project of the The Erdős Institute’s Fall 2022 Data Science Boot Camp with their project: Wildfire Spread Prediction!


Composed of Coker University Assistant Professor of Mathematics Dylan Bates and Bryn Mawr Physics Ph.D. Candidate Olivia McAuley, Team Mahogany created a state-of-the-art wildfire spread predictor. Utilizing a data-gathering technique from F. Huot, et. al., they compile weekly information regarding terrain, weather, and fire into 64 km x 64 km grids with 1 km resolution from Google Earth Engine. After training a Convolutional Neural Network (CNN) with a U-Net architecture on a random 32km x 32km crop for each region, they are able to predict the spread of the wildfire the next day with an optimized 42.41% recall and 29.72% AUC (PR). Their trained model is rapidly implemented on new data, providing the speed necessary to assist first responders in decisions regarding resource allocation and evacuations.


In choosing a project to develop over the course of the Data Science boot camp, Dylan recalled an interesting discussion with a student about the challenges of wildfire prediction. Since the vast majority of wildfires are human-caused it is nearly impossible to predict their occurrence, but once ignited, their spread can be monitored. Additionally, wildfires are incredibly costly (billions of dollars in damage and threats to life), so this tool is particularly relevant as we continue to see more each year. Olivia was immediately interested, adding that “[t]his is a major issue for a lot of people and I think we were able to bring a little bit of awareness about it to people who are not affected by wildfires.”


Team Mahogany started with a logistic regression model, expecting the temporal classification to give a reasonable baseline. Instead, the model simply predicted there would be fire where there was fire yesterday—it lacked spatial awareness. Though their Random Forest model produced the best precision (39.72%), it struggled with the same limitation as logistic regression. Thus, they scrapped any plans to try boosting and shifted their focus to Convolutional Neural Networks, which incorporate spatial information. Initially they tried to reproduce the CNN used in the paper from which they got their data collection scheme; however, these results were lacking and they decided to try something different. Instead, Dylan drew on his previous experience with semantic segmentation from graduate school and they created a CNN with U-NET architecture from scratch. Recognizing this was a problem of prediction by pixel—not identification (true semantic segmentation)—they removed one of the skip connections and were able to achieve predictions with improved precision and recall (by 4%). In fact, when comparing their results to those from the paper and others posted on Kaggle, they saw their results were state-of-the-art!


One of the biggest challenges they faced is that the data their model is based on is itself based on another machine learning algorithm. The data is collected from a satellite and fire is predicted with low to high confidence (all listed as fire), though there is a larger percentage of “unknown” locations. Team Mahogany believes that the limitations of this model may have contributed to their lower-than-expected precision and recall despite the hyperparameter tuning and depth of the model. Dylan is eager to try to improve on their results with a recently released dataset based on an improved fire recognition model.


Moreover, if given additional time to work on the wildfire spread prediction, the most pressing improvement they would like to make is to account for fire mitigation efforts. Dylan noted that—outside of the data collection—this is a realistic goal, as their model already accounts for natural fire breaks. Dousing of flames with water or fire retardant would be more complicated to add in, however. They are also curious about how their model would work on predicting wildfire spread in Europe; the comparison would be another way to check for bias in their model, for instance, if average wind direction vs prevailing winds made an impact.


Having completed the boot camp twice (both times with a top project), Dylan would advise future participants to be aware that a larger and more involved team will give you people to lean on—distribute tasks, pool resources (time and knowledge). With this project he was involved in nearly every step, though Olivia took the lead in coordinating their final submissions. They likely could have done more if they had a larger team, but there are benefits to having so much direct involvement as well—it all comes back to thinking about what you hope to get out of the experience. Overall they were happy with the end product and to make the Top 5. Olivia added, “it's ok to not know what you are doing because you have teammates who are willing to help you learn. This is a new way to think about solving problems and it might be challenging, but you'll get there.”


Olivia really enjoyed working on this project, especially the final presentation: “[M]y favorite is at the end of the presentation when Dylan created that Smokey the Bear masterpiece, I added the disclaimer "easter egg" that said "you can also prevent wildfires" which was the original line in the Smokey meme. I do not perceive myself as a funny person, but adding that disclaimer made me chuckle and I hope it made others laugh as well.”


Congratulations again to Team Mahogany for being a Top 5 Project of The Erdős Institute’s Fall 2022 Data Science Boot Camp!


smart search on arXiv

Xiaoyu Wang, Xin Su, Zeinab Elmi


Standard search methods on are outdated, and based on keyword matching. Modern chatbots such as chatGPT appears to do better. In this project proposal, We'd like to do something similar to chatGPT, if not better. Namely we would develop a chatbot specializing in With questions like:
1. What are the recent research results on XYZ?
2. Are there common topics both domain A and domain B are working on, but researchers are too lazy to spot it (by hopping over to a different domain and suffering from a different set of jargons)?
3. I want to research XYZ, could you provide a summary of the research results in the past month?
4. Could you provide a summary of the research results in arXiv:0123.45678v2?
Project required skills: web scraping, NLP, NLP fine tuning methods such as RAG or whatever we could invent, deployment

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page