Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2024 DATA SCIENCE BOOT CAMP
Muhammad Usman Taj
Roman Holowinsky, PhD
MAY 01, 2024
DIRECTOR
DATE
TEAM
Data Science - Economists
Muhammad Usman Taj, Jiuqin Wei, Di Kang, Fang Li, Estefania Padilla Gonzalez
In this study, our objective is to examine a dataset containing movie information to identify groups of similar movies based on their profitability. We utilized K-Means clustering, a well-known unsupervised machine learning approach. The dataset comprises various attributes including movie titles, release years, revenue, budgets, and genres associated with each movie. Following data preprocessing to address missing values and ensure data compatibility, we applied the clustering technique. Our results reveal that the number of votes is the most influential factor in determining a movie's profitability. Additionally, features like popularity and runtime are also noteworthy contributors.