top of page
CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE FALL 2022 DATA SCIENCE BOOT CAMP

Haoran Li

clear.png

Roman Holowinsky, PhD

DECEMBER 14, 2022

DIRECTOR

DATE

TEAM

Sycamore

Thanos Kritikos, Haoran Li

clear.png

This project aims to investigate Instacart data, classify app users based on consumer habits, and to forecast user purchases using machine learning algorithms. A kaggle.com competition provided the database. Data was cleaned and merged. Exploratory analysis examined consumer patterns. Bananas were the most popular purchase and produce sales are the highest. Monday, Tuesday, and 10 a.m. to 4 p.m. are the most lucrative profits. First, we'll group clients by shopping patterns. Clustering consumers with PCA and visualizing the top six components to find the most important pair. Using K-means, we can split clients into three groups. We spotted one generic user profile, one with a lot of baby formula, and one with lots of fresh produce purchased. XGBoost is used to train models that can be assessed using a few key features. We employ grid search cross-validation to discover the right modeling parameters. We ran XGBoost models on each cluster to generate a more focused, accurate model.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page