Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE FALL 2022 DATA SCIENCE BOOT CAMP
Haoran Li
Roman Holowinsky, PhD
DECEMBER 14, 2022
DIRECTOR
DATE
TEAM
Sycamore
Thanos Kritikos, Haoran Li
This project aims to investigate Instacart data, classify app users based on consumer habits, and to forecast user purchases using machine learning algorithms. A kaggle.com competition provided the database. Data was cleaned and merged. Exploratory analysis examined consumer patterns. Bananas were the most popular purchase and produce sales are the highest. Monday, Tuesday, and 10 a.m. to 4 p.m. are the most lucrative profits. First, we'll group clients by shopping patterns. Clustering consumers with PCA and visualizing the top six components to find the most important pair. Using K-means, we can split clients into three groups. We spotted one generic user profile, one with a lot of baby formula, and one with lots of fresh produce purchased. XGBoost is used to train models that can be assessed using a few key features. We employ grid search cross-validation to discover the right modeling parameters. We ran XGBoost models on each cluster to generate a more focused, accurate model.