Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2022 DATA SCIENCE BOOT CAMP
Kalven Bonin
Roman Holowinsky, PhD
JUNE 08, 2022
DIRECTOR
DATE
TEAM
Gordon Ramsey
Ronak Desai, Kalven Bonin, Shidhesh Supekar
Is it possible to classify a recipe’s cuisine type just from a list of
ingredients? Our project seeks to answer this question and
does so using some basic tools of Natural Language Processing.
We take a dataset from Kaggle.com that has a list of ~40,000
recipes with a cuisine type classification. One method we
employ is called a Bag of Words (BoW) model which take all
words found in the ingredients list and builds a classifier based
on the occurrences of those words in the training set. The other
method is the Term Frequency-Inverse Document Frequency
(TF-IDF) which considers the frequencies of individual words in
the training set. Both methods produced a testing accuracy of
greater than 60%, which is good considering that we
implemented the most naïve NLP models.