top of page

Your certificate is now private

CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE SPRING 2025 DATA SCIENCE BOOT CAMP

Adekunle Ajiboye

Roman Holowinsky, PhD

APRIL 25, 2025

DIRECTOR

DATE

clear.png

TEAM

Machine learning techniques in lung cancer prevalence studies

Zhuoran Wang, Fekadu Bayisa, Adekunle Ajiboye

clear.png

Lung cancer is a major public health concern. This work investigates lung cancer prevalence in Virginia counties (2014--2018) using county-level aggregated data on populations aged 18+. Data span four domains: Demographic (\% Male, Female, Black, White, Hispanic, age 65+), Behavioral (smoking, binge drinking, obesity), Socioeconomic (poverty rate, Social Deprivation Index, median income), and Environmental (PM2.5 air quality). After preprocessing, we apply a Poisson GLM with elastic net and XGBoost with Poisson loss. XGBoost outperforms GLM (MAE: 5.963 vs 6.313), identifying smoking, PM2.5, obesity, and income as key predictors. GLM shows positive associations with smoking, age 65+, and racial composition; negative with poverty and Hispanic proportion. Results support targeting high-risk groups and integrating behavioral and environmental data into prevention strategies.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

©2017-2025 by The Erdős Institute.

bottom of page