An OCR Pipeline and Semantic Text Analysis for Comics

Hartel R, Dunst A (2021)


Publication Type: Conference contribution

Publication year: 2021

Journal

Publisher: Springer Science and Business Media Deutschland GmbH

Book Volume: 12666 LNCS

Pages Range: 213-222

Conference Proceedings Title: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Event location: Virtual, Online

ISBN: 9783030687793

DOI: 10.1007/978-3-030-68780-9_19

Abstract

Optical character recognition has remained a challenge for comics, given the high variability of placement of text on the page, the wide variety of frequently handwritten fonts, and the limited availability and small size of datasets. This paper reports on currently on-going work on an OCR pipeline that includes text spotting with the help of a U-Net based fully convolutional neural network and OCR training with the open-source software Calamari, which was performed on the “Graphic Narrative Corpus” of book-length graphic novels written in English. Based on the results of the OCR training, we then present an analysis of the textual properties of 129 graphic novels correlated with page length, historical development, and genre affiliation.

Authors with CRIS profile

Involved external institutions

How to cite

APA:

Hartel, R., & Dunst, A. (2021). An OCR Pipeline and Semantic Text Analysis for Comics. In Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, Roberto Vezzani (Eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 213-222). Virtual, Online: Springer Science and Business Media Deutschland GmbH.

MLA:

Hartel, Rita, and Alexander Dunst. "An OCR Pipeline and Semantic Text Analysis for Comics." Proceedings of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, Virtual, Online Ed. Alberto Del Bimbo, Rita Cucchiara, Stan Sclaroff, Giovanni Maria Farinella, Tao Mei, Marco Bertini, Hugo Jair Escalante, Roberto Vezzani, Springer Science and Business Media Deutschland GmbH, 2021. 213-222.

BibTeX: Download