TEAM
Handwritten Text Recognition (HTR)
When humanities researchers examine written script from a historical archive, simply reading and transcribing written text is a challenge! Written archived script can be hundreds of years old and highly dissimilar to modern languages. Transcribing such script with neural-networks (from image files like jpeg or pdf) is called HTR (Handwritten Text Recognition), similar to but more challenging than OCR (optical character recognition).
We intend to train a model from scratch to perform HTR, and compare it with available models in the "Foundation-model-plus-finetuning" paradigm (e.g. the Transkribus models)- we will use data from a digital humanities archive, or, obtain data via connection to University of Texas at Austin via the Digitizing-the-Humanities initiative.