Robert Gacki, Jinwoong Nam
Our objective for this project is to develop an algorithm that is capable of taking a list of user inputted recipe ingredients and classifying it into one of twenty cuisine types. The dataset consists of 39,744 unique recipes across all 20 cuisine types, with 6,714 unique ingredients.
After data cleaning, Term Frequency – Inverse Document Frequency (TF-IDF) Vectorization was performed, which is a fancy way of saying each ingredient was assigned a weighted value for each recipe. Then, we attempted three different methods for predicting cuisine types, with the goal being maximum accuracy of classification. Maximum accuracy was achieved with the Support Vector Machine (SVM) method, which classified recipes to their correct cuisine type 80.87% of the time.
This solution will allow for easy and efficient classification of all recipes submitted to the website, which will allow users to filter and search for the various cuisine types they are interested in.