top of page



Thanos Kritikos, Haoran Li


This project aims to investigate Instacart data, classify app users based on consumer habits, and to forecast user purchases using machine learning algorithms. A competition provided the database. Data was cleaned and merged. Exploratory analysis examined consumer patterns. Bananas were the most popular purchase and produce sales are the highest. Monday, Tuesday, and 10 a.m. to 4 p.m. are the most lucrative profits. First, we'll group clients by shopping patterns. Clustering consumers with PCA and visualizing the top six components to find the most important pair. Using K-means, we can split clients into three groups. We spotted one generic user profile, one with a lot of baby formula, and one with lots of fresh produce purchased. XGBoost is used to train models that can be assessed using a few key features. We employ grid search cross-validation to discover the right modeling parameters. We ran XGBoost models on each cluster to generate a more focused, accurate model.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page