top of page
Team Juniper: Predicting Forest Cover Types

written by

Elizabeth Campolongo

Thursday, February 16, 2023

Congratulations to Team Juniper for being a Top 5 Project of the The Erdős Institute’s Fall 2022 Data Science Boot Camp with their project Predicting Forest Cover Types and Visualizing Data!


Mathematics Ph.D. Candidate Chris Chia (Binghamton University) and Ecology Ph.D. Student Moeka Ono (Texas A&M University) partnered to form Team Juniper and create a forest cover type predictor. Utilizing data provided by Colorado State University (compiled from the US Forest Service Information System and uploaded to Kaggle), they developed an XGBoost model to predict the dominant forest cover type with precision down to a 900m^2 patch of forest at 88% accuracy. Chris had a secondary goal of applying Topological Data Analysis (TDA) to the dataset, which was particularly well-suited to this task since it had a balanced target class. He applied UMAP to their data and found that elevation was strongly correlated with the x-axis separation of the clusters, suggesting the importance of elevation in distinguishing cover types. This is further indicated by recognizing the groupings of cover types that overlap elevations.


They were originally a team of four: two ecologists and two mathematicians, so they decided to pursue a project in ecology. However, two group members quickly had to leave due to time conflicts. Moeka went on Kaggle and found a dataset composed of ecological and geological features from 900m^2 swaths of Roosevelt National Forest (Colorado), each of which was also labeled with its dominant forest cover type. The data was very clean and balanced, meaning they had more time to explore the data and test many different models before settling on XGBoost. A convenient feature of their model is that the data on which it relies is either routinely collected or unlikely to shift significantly over time. For instance, climate change is leading to shifts in forest ecosystems, and the collected data fed into their model would allow for a more accurate assessment of predominant forest cover types without requiring park services to go out and determine it manually. A key importance of this would be in understanding density of fire-resistant tree species in key areas.


Looking forward, Moeka is curious if they could improve on their results through different algorithms, such as neural networks. However, such models require much more time to train. Chris would like to dedicate more time to tuning the UMAP parameters to see if the clusters can further distinguish between different tree cover types. Their final model struggled the most with differentiating between Spruce/Fir and Lodgepole Pines, an important distinction since the latter are relatively fire-resistant, but the former are not. Finally, they would like to explore the temporal shift in forest composition, which would require historical data for comparison.


Chris and Moeka both agreed that their biggest challenge was being a team of two. The first time Moeka did the boot camp, she was on a four-person team, so they could delegate tasks and play to everyone’s strengths. This time, Moeka and Chris were both involved in every step, which can be more difficult, but ultimately they are very happy with what they were able to achieve together. It was particularly rewarding for them to place in the Top 5; Chris was excited that they got to present their project and results.


Reflecting on their process, they both agree that future boot camp participants’ primary goal should be to finish a project. As Moeka said, “it can be intimidating [at first], everyone seems to be very experienced, but do your best and target to complete a project…you can learn from the experience and apply the skills for your next project or research”. It also helps to find a topic of interest (eg., a hobby or research area) instead of selecting something at random, then you can take ownership and it’s more meaningful on your resume. To which Chris added that “it’s easier to make a project that stands out” when you’re passionate about the subject. They further emphasize the importance of asking for advice—don’t sit there struggling. “Our mentor [Elizabeth Campolongo] was really helpful,” providing a guide on how to run the TDA and helping them stay on schedule by reminding them of the weekly goals. Ultimately, Chris summarized their advice as “keep on schedule. Getting to the finish line is an accomplishment!” At the end, you’ll have something to be proud of.


Congratulations again to Team Juniper for being a Top 5 Project of The Erdős Institute’s Fall 2022 Data Science Boot Camp!


smart search on arXiv

Xiaoyu Wang, Xin Su, Zeinab Elmi, Monalisa Dutta


Standard search methods on are outdated, and based on keyword matching. Modern chatbots such as chatGPT appears to do better. In this project proposal, We'd like to do something similar to chatGPT, if not better. Namely we would develop a chatbot specializing in With questions like:
1. What are the recent research results on XYZ?
2. Are there common topics both domain A and domain B are working on, but researchers are too lazy to spot it (by hopping over to a different domain and suffering from a different set of jargons)?
3. I want to research XYZ, could you provide a summary of the research results in the past month?
4. Could you provide a summary of the research results in arXiv:0123.45678v2?
Project required skills: web scraping, NLP, NLP fine tuning methods such as RAG or whatever we could invent, deployment

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page