fang: Fast Annotation of Glyphs in Historical Printed Documents

Kordon F, Weichselbaumer N, Herz R, van der Loop J, Mossman S, Potten E, Seuret M, Mayr M, Wu F, Christlein V (2024)


Publication Type: Conference contribution, Conference Contribution

Publication year: 2024

Journal

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 14994 LNCS

Pages Range: 377-392

Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Event location: Athens GR

ISBN: 9783031704413

URI: https://link.springer.com/chapter/10.1007/978-3-031-70442-0_23

DOI: 10.1007/978-3-031-70442-0_23

Abstract

The extraction and analysis of large numbers of glyphs, and the associated opportunities for constructing a corpus of glyphs from types of the fifteenth century, offer significant research potential for scholars in book science. Such a corpus could be used in many ways, not least in assisting in the identification of fragments, charting the movements of type, and examining the impact of wear on type. Recognising this potential, we have developed fang (Code is available at https://github.com/Werck-der-buecher/FAnG.), a software that efficiently extracts and categorises glyphs from historical printed documents. Our approach involves several stages: (1) using Optical Character Recognition to extract glyphs in bulk, (2) employing a joint energy-based model for character classification and out-of-distribution pruning, and (3) providing a comprehensive toolset for manual review and editing, including deletions/reassignments and sorting by similarity. A significant strength of this design is the utilisation of existing text transcriptions and the context-awareness of trained language models, eliminating the need for explicit glyph location ground truth or glyph templates. By parallelising the extraction, we can quickly process entire digitised books with hundreds of pages, setting our system apart from existing glyph annotation tools. In experiments on digital reproductions of the Catholicon and 36-line Bible, the method demonstrates good spatial coverage of the detected glyphs, high character classification accuracy, and yields a low number of outliers. Our system represents a significant advancement in historical document analysis, providing researchers with an efficient tool for glyph extraction and categorisation.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Kordon, F., Weichselbaumer, N., Herz, R., van der Loop, J., Mossman, S., Potten, E.,... Christlein, V. (2024). fang: Fast Annotation of Glyphs in Historical Printed Documents. In Giorgos Sfikas, George Retsinas (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 377-392). Athens, GR: Springer Science and Business Media Deutschland GmbH.

MLA:

Kordon, Florian, et al. "fang: Fast Annotation of Glyphs in Historical Printed Documents." Proceedings of the 16th IAPR International Workshop on Document Analysis Systems, DAS 2024, Athens Ed. Giorgos Sfikas, George Retsinas, Springer Science and Business Media Deutschland GmbH, 2024. 377-392.

BibTeX: Download