Samsung R&D Institute Poland Placed 2nd in DCASE 2023 Challenge

Samsung R&D Institute Poland (SRPOL) garnered recognition as a prominent participant in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 challenge, organized by the Institute of Electrical and Electronics Engineers (IEEE). This competition leverages advanced artificial intelligence (AI) to comprehend and interpret audio signals.

Leveraging their prior successes in the DCASE challenge (placing first in two tasks in 2019, second in 2019 and 2020, and third in 2022), SRPOL engineers focused their efforts on "Task 6B: Language-Based Audio Retrieval" achieving a commendable second-place ranking. This task seeks to evaluate methods where a retrieval system takes a free-form textual description as input and sorts audio signals in a fixed data set based on their match to the given description.

The Language-Based Audio Retrieval aspect of DCASE aligns seamlessly with SRPOL's current AI Team strategy, which is distinguished for its accomplishments and implementations in both the Natural Language Processing (NLP) and Audio domains. SRPOL's expertise and experience are regularly harnessed to enhance Bixby, facilitate superior machine translations, and advance various other applications utilizing NLP techniques (as highlighted in recent news concerning our latest achievement in the NLP arena).

Within the realm of Audio, SRPOL specializes in Sound Recognition, Sound Source Separation, and Deep Signal Processing. At present, SRPOL's AI Team synergizes its core competencies – NLP and Sound Recognition – to create an integrated product. From a user perspective, this integration opens up novel applications and extensions of existing functionalities.

Presently, users can navigate video galleries based on predetermined tags established through audio content analysis via Sound Recognition technology. However, the process of sifting through fixed tag lists has limitations. This is where NLP comes into play, adding the missing piece to the puzzle. By combining Sound Recognition and NLP technologies, users could peruse their content based on captions presented in natural language.