AI

Handwriting Enhancement: Recognition-Based and Recognition-Independent Approaches for On-device Online Handwritten Text Alignment

By Karina Korovai Samsung R&D Institute Ukraine
By Dmytro Zhelezniakov Samsung R&D Institute Ukraine
By Oleg Yakovchuk Samsung R&D Institute Ukraine
By Olga Radyvonenko Samsung R&D Institute Ukraine
By Nataliya Sakhnenko Samsung R&D Institute Ukraine

Introduction


Handwriting has served as the primary method for information recording over an extensive historical span [1], as its development begins in early education, accompanied by a strong emphasis on writing accuracy.
The emergence of personal mobile devices, equipped with touchscreen and stylus, has significantly contributed to the active exploration, advancement, and popularization of handwriting-based interfaces, commonly known as ‘pencil and paper’ interfaces [2]. Such interfaces offer a natural and versatile user experience, enabling the creation of handwritten documents encompassing textual content, mathematical expressions, charts, diagrams, tables, images, and other elements.
Still, improving the visual quality of handwritten content remains a relatively unexplored and challenging task.

Figure 1a. Input document with text lines

Figure 1b. Document after text lines alignment

The primary objective of this study is to enhance the user experience by refining the visual representation of digitally handwritten text.
We propose two approaches to enhance handwriting legibility by means of straightening the written content: recognition-based and recognition-independent (Hierarchical RNN). We aim to create a lightweight, resource-efficient solution for low-end mobile devices that accommodates various writing styles without deforming strokes while preserving the original style and readability. Our proposed approaches stand as the framework tailored for enhancing handwriting legibility in digital ink applications within the on-device implementation environment.

Our Approaches

A. Recognition-Based Method


The first proposed method enhances handwriting legibility by recognizing written characters through character segmentation and text recognition. It involves three stages:

    
•    
analyzing input strokes to identify text blocks and lines using a hierarchical neural network [3];
    
•     
recognizing symbols along with their corresponding points with a Bidirectional Long Short-Term Memory (BLSTM) neural network [4], [5];
    
•    
conducting structural analysis for precise character positioning.


Figure 2. The workflow of the alignment in the recognition-based method.

The alignment system focuses on placing each symbol on a straight baseline, considering text metrics like baseline position, ascent, x-height, and descent values. Symbols are classified into levels (‘Tall’, ‘Basic’, ‘Lengthy’, ‘Top’, ‘Bottom’, ‘Middle’) based on expected vertical locations. Complex geometric feature analysis helps determine baselines, especially for symbols with various writing styles.

Figure 3. An example of symbol levels representation and main lines definition. Text line metrics given for algorithm output are marked with blue color

Symbols written continuously without lifting the pen are grouped and aligned together to preserve the original handwriting sequence. The final step calculates text line metrics for proper alignment, adjusting strokes to fit within designated baselines and maintaining consistent line spacing. This process accounts for personal writing features and ensures natural-looking, aligned text.

B. Recognition-Independent Method


The second proposed method is a recognition-independent approach inspired by a simple encoder-decoder model [6] and the hierarchical RNN architecture [3]. It processes data at point and stroke levels separately, using stacked bidirectional Gated Recurrent Unit (GRU) [7] to balance efficiency and performance.

The hierarchical transition from points to strokes is performed by using a subsampling level, which employs fragment edge pooling technique [3], and allows the model to learn both local and global patterns. The final output is adjusted through a linear output to strokes offset values for each stroke.

The model operates on single lines, with dataset preparation involving size normalization, resampling, and position normalization. Input-target pairs are generated using recognition-based approach with quality control to eliminate problematic samples. Although this method requires the presence of a diverse range of writing styles in the training data, it performs straightening even without the knowledge about text label, making it versatile and robust against recognition errors.

Figure 4. Hierarchical recurrent neural network architecture

Our Results


We conducted a comparative analysis of the presented methods’ performance compared to our previous work [8].

Experiment 1: Efficiency evaluation


Dataset: 10k preprocessed handwritten single-line texts from the in-house test set.
On-device: Samsung Tab S9 Ultra 5G with Qualcomm Snapdragon 8 Gen 2 3.36GHz CPU.
Table 1 presents the time taken to align handwritten text. All the approaches were executed in a single-thread mode on the CPU without optimization for GPU usage.

Table 1. Time Used to Align a Single Text Line in ms

Experiment 2: User study


Dataset: 120 examples with English single-line texts randomly selected from in-house test set.
Number of participants: 19.
Setup: users were presented with the original image, marked as input, and two unmarked modified versions in the shuffled order, all rendered using straight cells as background; users were asked to compare the alignment results and evaluate the transformations relative to the input on a scale from 1 to 5.

The summarized results are shown in Fig. 6.

Figure 5. User study: choice distribution.

Conclusions


Figure 6. Examples of input handwritten text strokes along with the resultant well-aligned output

We proposed two approaches for straightening of the handwritten text in pen-centric applications: recognition-based and recognition-independent.

We conducted various experiments, including a user study, to assess and compare the accuracy and efficiency of the proposed methods. Both methods demonstrate impressive performance with a line error rate below 4.3%. For the second method, an average error level is less than two pixels per stroke, which is almost invisible to the naked eye.

The most effective combination of these two methods is achieved by employing the first method to generate datasets for training and validating the second method.

Our full paper is available at: https://ieeexplore.ieee.org/document/10552742

References

1. R. Allen, The Notebook: A History of Thinking on Paper, London, U.K.:Profile Books Limited, 2023.

2. D. Zhelezniakov, V. Zaytsev, O. Radyvonenko and Y. Yakishyn, "InteractivePaper: Minimalism in document editing UI through the handwriting prism", Proc. Adjunct Publication 32nd Annu. ACM Symp. User Interface Softw. Technol., pp. 13-15, Oct. 2019.

3. A. Grygoriev, I. Degtyarenko, I. Deriuga, S. Polotskyi, V. Melnyk, D. Zakharchuk, et al., "HCRNN: A novel architecture for fast online handwritten stroke classification" in Document Analysis and Recognition—ICDAR, Cham, Switzerland:Springer, pp. 193-208, 2021.

4. A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures", Neural Netw., vol. 18, no. 5, pp. 602-610, Jul. 2005.

5. V. Frinken and S. Uchida, "Deep BLSTM neural networks for unconstrained continuous handwritten text recognition", Proc. 13th Int. Conf. Document Anal. Recognit. (ICDAR), pp. 911-915, Aug. 2015.

6. Dohyeon, K. I. M., et al. "Method and electronic device for correcting handwriting input." U.S. Patent No. 11,450,041. 20 Sep. 2022.

7. K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation", arXiv:1406.1078, 2014.

8. K. Korovai, D. Zhelezniakov, O. Radyvonenko, O. Yakovchuk, I. Deriuga and N. Sakhnenko, "Recognition-independent handwritten text alignment using lightweight recurrent neural network", Proc. SIGGRAPH Asia Posters, pp. 1-20, Nov. 2023.