top of page

TEAM

Machine learning techniques in lung cancer prevalence studies

Zhuoran Wang, Fekadu Bayisa, Adekunle Ajiboye

clear.png

Lung cancer is a major public health concern. This work investigates lung cancer prevalence in Virginia counties (2014--2018) using county-level aggregated data on populations aged 18+. Data span four domains: Demographic (\% Male, Female, Black, White, Hispanic, age 65+), Behavioral (smoking, binge drinking, obesity), Socioeconomic (poverty rate, Social Deprivation Index, median income), and Environmental (PM2.5 air quality). After preprocessing, we apply a Poisson GLM with elastic net and XGBoost with Poisson loss. XGBoost outperforms GLM (MAE: 5.963 vs 6.313), identifying smoking, PM2.5, obesity, and income as key predictors. GLM shows positive associations with smoking, age 65+, and racial composition; negative with poverty and Hispanic proportion. Results support targeting high-risk groups and integrating behavioral and environmental data into prevention strategies.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

©2017-2025 by The Erdős Institute.

bottom of page