top of page

TEAM

Handwritten Text Recognition (HTR)

clear.png

When humanities researchers examine written script from a historical archive, simply reading and transcribing written text is a challenge! Written archived script can be hundreds of years old and highly dissimilar to modern languages. Transcribing such script with neural-networks (from image files like jpeg or pdf) is called HTR (Handwritten Text Recognition), similar to but more challenging than OCR (optical character recognition).

We intend to train a model from scratch to perform HTR, and compare it with available models in the "Foundation-model-plus-finetuning" paradigm (e.g. the Transkribus models)- we will use data from a digital humanities archive, or, obtain data via connection to University of Texas at Austin via the Digitizing-the-Humanities initiative.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page