GAN-based text line segmentation method for challenging handwritten documents


Özşeker İ., DEMİR A. A., ÖZKAYA U.

International Journal on Document Analysis and Recognition, 2024 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Publication Date: 2024
  • Doi Number: 10.1007/s10032-024-00488-5
  • Journal Name: International Journal on Document Analysis and Recognition
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Keywords: Document analysis, Generative adversarial networks, Handwritten document, Text line segmentation
  • Yıldız Technical University Affiliated: Yes

Abstract

Text line segmentation (TLS) is an essential step of the end-to-end document analysis systems. The main purpose of this step is to extract the individual text lines of any handwritten documents with high accuracy. Handwritten and historical documents mostly contain touching and overlapping characters, heavy diacritics, footnotes and side notes added over the years. In this work, we present a new TLS method based on generative adversarial networks (GAN). TLS problem is tackled as an image-to-image translation problem and the GAN model was trained to learn the spatial information between the individual text lines and their corresponding masks including the text lines. To evaluate the segmentation performance of the proposed GAN model, two challenging datasets, VML-AHTE and VML-MOC, were used. According to the qualitative and quantitative results, the proposed GAN model achieved the best segmentation accuracy on the VML-MOC dataset and showed competitive performance on the VML-AHTE dataset.