Samsung R&D Institute Ukraine Contributes to AI for Audio Subtitles Accessibility TV Service

In the frame of the CES 2024 event, Samsung has presented a novel accessibility feature: Audio Subtitles (Pic. 1). CES (Consumer Electronics Show) is the world’s most powerful tech event organized by the Consumer Technology Association. It showcases the latest innovations and trends in consumer electronics, including new products from major companies and startups in the industry. It attracts a wide range of attendees, including manufacturers, retailers, media representatives, investors, and technology enthusiasts. Showcased Samsung's Audio Subtitles is the world's first industrial on-device TV service for reading burned-in subtitles aloud in real-time. Enabling this functionality is critically essential for visually impaired people or other people who are unable to read the subtitles from the screen (in their language) while the audio track is in another foreign language.

Picture 1 . Audio Subtitles presentation at CES 2024

Globally, at least 2.2 billion people have near or distant vision impairment [1]. It significantly limits their access to TV, movies, and other video content. Subtitling is a very popular feature for audiovisual translation and multimedia localization. Reading subtitles aloud (voice-over) makes producing the audio track in different languages possible without dubbing.

The AI behind the Audio Subtitles solution was developed by engineers from Samsung R&D Institute Ukraine (SRUKR) together with colleagues from Korea. Samsung's engineer implemented an innovative lightweight on-device real-time solution based on computer vision deep learning technologies. These AI achievements have been patented [2] and presented in the frame of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023) [3]. This conference is one of the most prestigious academic events in signal processing, attracting researchers, engineers, and students worldwide. It is a platform for sharing cutting-edge research findings, discussing new ideas, and networking within the community.

SRUKR presented computationally efficient approaches and NN architectures for real-time on-device subtitle detection, recognition, and voice. Particularly, a novel subtitle tracking module was presented based on processing two consecutive frames by a convolutional neural network (CNN)-based classifier and a subtitle text line localization module based on single-shot multi-box detector NN-architecture. In addition, a modified convolutional recurrent neural network (CRNN) architecture was proposed and adapted for on-device implementation text recognition using a new chunking procedure for the CNN input and merging this data in the RNN part.

Picture 2 . Olga Radyvonenko, Lab leader at SRUKR, gives the speech at ICASSP 2023 about innovative approaches developed in Samsung

The system currently supports two languages, English and Korean, achieving high recognition accuracy across both languages (Word Recognition Rate > 95%) and low processing latency (< 200 ms). All these facts establish the state-of-the-art in on-device OCR-based video-subtitle recognition.

The solution supports numerous subtitle styles (text fonts, sizes, colors, and backgrounds) and positions. Additionally, specialized context detection and subtitle style tracking algorithms prevent the reading of non-subtitle textual elements from the screen.

The service provides extensive customization options tailored to the user's preferences. Specifically, the TTS tempo and TTS volume can be adjusted. An adaptive algorithm for the automatic regulation of TTS speech speed was created to resolve the problem of video and synthetic speech desynchronization.

Undoubtedly, these experiences gave us a sneak peek into the future of user-TV interactions and emphasized the significance of continuous innovation and collaboration. As we navigate the fast-paced world of technology, our future efforts for expanding the Audio Subtitles Service will revolve around extending language coverage and streamlining user-TV interactions even further.

References:

[1] Blindness and vision impairment // WHO report 10 August 2023 [Online resource] https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment

[2] Degtyarenko I., Tkach N., Seliuk K., Ivanov O., Sielikhov V. Electronic device and audio track obtaining method therefor// Patent Publication Number WO/2024/063313. - Publication Date 28.03.2024. https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2024063313&_cid=P10-LUJK8M-67393-1

[3] Degtyarenko I., Tkach N., Radyvonenko O., Seliuk K., Ivanov O., Sielikhov V., Sang young Lee, Youn-ho Choi, Cheul-hee Hahm: SDRV: Real-time On-device Subtitles Detection, Recognition and Voicing // 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSPW59220.2023.10192952. https://ieeexplore.ieee.org/document/10192952