top of page



Shirlyn Wang, Christopher Stith, Katja Vassilev


The goal of this project is to identify Personally Identifiable Information (PII) in student essays for the purpose of generating distributable educational tools. The data came from a competition posted on Kaggle, posed by a group at Vanderbilt University and the Learning Agency Lab. We fine-tuned existing language models (RoBERTa and DeBERTa) to identify the PII that we were concerned about in the student essays, in order to maximize the F5 score, the metric set forth by the competition.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page