top of page


Data Science - Economists

Muhammad Usman Taj, Jiuqin Wei, Di Kang, Fang Li, Estefania Padilla Gonzalez


In this study, our objective is to examine a dataset containing movie information to identify groups of similar movies based on their profitability. We utilized K-Means clustering, a well-known unsupervised machine learning approach. The dataset comprises various attributes including movie titles, release years, revenue, budgets, and genres associated with each movie. Following data preprocessing to address missing values and ensure data compatibility, we applied the clustering technique. Our results reveal that the number of votes is the most influential factor in determining a movie's profitability. Additionally, features like popularity and runtime are also noteworthy contributors.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page