
Certificate of Completion

THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2025 DATA SCIENCE BOOT CAMP
Adekunle Ajiboye
Roman Holowinsky, PhD
APRIL 25, 2025
DIRECTOR
DATE

TEAM
Machine learning techniques in lung cancer prevalence studies
Zhuoran Wang, Fekadu Bayisa, Adekunle Ajiboye

Lung cancer is a major public health concern. This work investigates lung cancer prevalence in Virginia counties (2014--2018) using county-level aggregated data on populations aged 18+. Data span four domains: Demographic (\% Male, Female, Black, White, Hispanic, age 65+), Behavioral (smoking, binge drinking, obesity), Socioeconomic (poverty rate, Social Deprivation Index, median income), and Environmental (PM2.5 air quality). After preprocessing, we apply a Poisson GLM with elastic net and XGBoost with Poisson loss. XGBoost outperforms GLM (MAE: 5.963 vs 6.313), identifying smoking, PM2.5, obesity, and income as key predictors. GLM shows positive associations with smoking, age 65+, and racial composition; negative with poverty and Hispanic proportion. Results support targeting high-risk groups and integrating behavioral and environmental data into prevention strategies.


