Vinicius Ambrosi

Roman Holowinsky, PhD

September 06, 2024

DIRECTOR

DATE

TEAM

Stanford Sentiment Treebank with 5 labels (SST-5)

Gilyoung Cheong, Dohoon Kim, Vinicius Ambrosi

The SST-5, or Stanford Sentiment Treebank with 5 labels, is a dataset utilized for sentiment analysis. It contains 11,855 individual sentences sourced from movie reviews, along with 215,154 unique phrases from parse trees. These phrases are annotated by three human judges and are categorized as negative, somewhat negative, neutral, somewhat positive, or positive. This fine-grained labeling is what gives the dataset its name, SST-5. According to the leader board, the highest accuracy on the test set is 59.8, but more interestingly, the model that obtained 5th rank with accuracy of 55.5 only used BERT Large model with dropouts. The purpose of our project is to see if we can achieve to be in top 5 of the leader board by hyperparameter tuning (on learning rate and hyperparameters of Adam optimizer) and fine-tuning.

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Vinicius Ambrosi

TEAM

Stanford Sentiment Treebank with 5 labels (SST-5)

Gilyoung Cheong, Dohoon Kim, Vinicius Ambrosi