AI
Handwriting has served as the primary method for information recording over an extensive historical span [1], as its development begins in early education, accompanied by a strong emphasis on writing accuracy.
The emergence of personal mobile devices, equipped with touchscreen and stylus, has significantly contributed to the active exploration, advancement, and popularization of handwriting-based interfaces, commonly known as ‘pencil and paper’ interfaces [2]. Such interfaces offer a natural and versatile user experience, enabling the creation of handwritten documents encompassing textual content, mathematical expressions, charts, diagrams, tables, images, and other elements.
Still, improving the visual quality of handwritten content remains a relatively unexplored and challenging task.
Figure 1a. Input document with text lines
Figure 1b. Document after text lines alignment
The primary objective of this study is to enhance the user experience by refining the visual representation of digitally handwritten text.
We propose two approaches to enhance handwriting legibility by means of straightening the written content: recognition-based and recognition-independent (Hierarchical RNN).
We aim to create a lightweight, resource-efficient solution for low-end mobile devices that accommodates various writing styles without deforming strokes while preserving the original style and readability.
Our proposed approaches stand as the framework tailored for enhancing handwriting legibility in digital ink applications within the on-device implementation environment.
The first proposed method enhances handwriting legibility by recognizing written characters through character segmentation and text recognition. It involves three stages:
Figure 2. The workflow of the alignment in the recognition-based method.
The alignment system focuses on placing each symbol on a straight baseline, considering text metrics like baseline position, ascent, x-height, and descent values. Symbols are classified into levels (‘Tall’, ‘Basic’, ‘Lengthy’, ‘Top’, ‘Bottom’, ‘Middle’) based on expected vertical locations. Complex geometric feature analysis helps determine baselines, especially for symbols with various writing styles.
Figure 3. An example of symbol levels representation and main lines definition. Text line metrics given for algorithm output are marked with blue color
Symbols written continuously without lifting the pen are grouped and aligned together to preserve the original handwriting sequence. The final step calculates text line metrics for proper alignment, adjusting strokes to fit within designated baselines and maintaining consistent line spacing. This process accounts for personal writing features and ensures natural-looking, aligned text.
The second proposed method is a recognition-independent approach inspired by a simple encoder-decoder model [6] and the hierarchical RNN architecture [3]. It processes data at point and stroke levels separately, using stacked bidirectional Gated Recurrent Unit (GRU) [7] to balance efficiency and performance.
The hierarchical transition from points to strokes is performed by using a subsampling level, which employs fragment edge pooling technique [3], and allows the model to learn both local and global patterns. The final output is adjusted through a linear output to strokes offset values for each stroke.
The model operates on single lines, with dataset preparation involving size normalization, resampling, and position normalization. Input-target pairs are generated using recognition-based approach with quality control to eliminate problematic samples. Although this method requires the presence of a diverse range of writing styles in the training data, it performs straightening even without the knowledge about text label, making it versatile and robust against recognition errors.
Figure 4. Hierarchical recurrent neural network architecture
We conducted a comparative analysis of the presented methods’ performance compared to our previous work [8].
Dataset: 10k preprocessed handwritten single-line texts from the in-house test set.
On-device: Samsung Tab S9 Ultra 5G with Qualcomm Snapdragon 8 Gen 2 3.36GHz CPU.
Table 1 presents the time taken to align handwritten text. All the approaches were executed in a single-thread mode on the CPU without optimization for GPU usage.
Table 1. Time Used to Align a Single Text Line in ms
Dataset: 120 examples with English single-line texts randomly selected from in-house test set.
Number of participants: 19.
Setup: users were presented with the original image, marked as input, and two unmarked modified versions in the shuffled order, all rendered using straight cells as background; users were asked to compare the alignment results and evaluate the transformations relative to the input on a scale from 1 to 5.
The summarized results are shown in Fig. 6.
Figure 5. User study: choice distribution.
Figure 6. Examples of input handwritten text strokes along with the resultant well-aligned output
We proposed two approaches for straightening of the handwritten text in pen-centric applications: recognition-based and recognition-independent.
We conducted various experiments, including a user study, to assess and compare the accuracy and efficiency of the proposed methods. Both methods demonstrate impressive performance with a line error rate below 4.3%. For the second method, an average error level is less than two pixels per stroke, which is almost invisible to the naked eye.
The most effective combination of these two methods is achieved by employing the first method to generate datasets for training and validating the second method.
Our full paper is available at: https://ieeexplore.ieee.org/document/10552742
1. R. Allen, The Notebook: A History of Thinking on Paper, London, U.K.:Profile Books Limited, 2023.
2. D. Zhelezniakov, V. Zaytsev, O. Radyvonenko and Y. Yakishyn, "InteractivePaper: Minimalism in document editing UI through the handwriting prism", Proc. Adjunct Publication 32nd Annu. ACM Symp. User Interface Softw. Technol., pp. 13-15, Oct. 2019.
3. A. Grygoriev, I. Degtyarenko, I. Deriuga, S. Polotskyi, V. Melnyk, D. Zakharchuk, et al., "HCRNN: A novel architecture for fast online handwritten stroke classification" in Document Analysis and Recognition—ICDAR, Cham, Switzerland:Springer, pp. 193-208, 2021.
4. A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures", Neural Netw., vol. 18, no. 5, pp. 602-610, Jul. 2005.
5. V. Frinken and S. Uchida, "Deep BLSTM neural networks for unconstrained continuous handwritten text recognition", Proc. 13th Int. Conf. Document Anal. Recognit. (ICDAR), pp. 911-915, Aug. 2015.
6. Dohyeon, K. I. M., et al. "Method and electronic device for correcting handwriting input." U.S. Patent No. 11,450,041. 20 Sep. 2022.
7. K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation", arXiv:1406.1078, 2014.
8. K. Korovai, D. Zhelezniakov, O. Radyvonenko, O. Yakovchuk, I. Deriuga and N. Sakhnenko, "Recognition-independent handwritten text alignment using lightweight recurrent neural network", Proc. SIGGRAPH Asia Posters, pp. 1-20, Nov. 2023.