[SR Talks] ② Interview with an Audio Expert at Samsung R&D Institute Poland

Q: Please briefly introduce yourself, the Audio Experience Group in Samsung R&D Institute Poland, and the kind of work that goes on there. What projects are you working on?

My name is Jakub Tkaczuk, and I’m the head of Audio Experience at Samsung R&D Institute Poland (SRPOL). Our projects take advantage of the unique combination of two astonishing areas and domains: audio (including signal processing) and artificial intelligence (AI). Among many assignments, we are working on sound recognition technology development and embedding features based on acoustic reasoning into our devices. In addition, we strongly believe that AI is not only about detection, classification, and recognition. Therefore, we also utilize AI methods to modify and enhance audio signals to improve the overall listening experience for our customers.

While our expertise lies in the audio field, current trends, challenges, and opportunities require us to be flexible and open to other modalities. We do consider multimodal and often hybrid approaches wherever feasible. Thus, we built on the latest advancements not only in natural language processing (NLP) and computer vision but also investigated new modalities and approaches to secure future-oriented differentiation features. It is worth mentioning that we cover the full “AI development” stack, starting from data acquisition to actual AI model deployment on actual devices. Model compression and efficient implementation leveraging dedicated neural processing units (NPUs), as well as quality verification and solutions maintenance—we are involved in every aspect of bringing added value to our products, and we are proud of this!

Q: Please tell me about the importance of your research field or technology.

Audio is fascinating. People listen to audio 24/7, and we don’t realize how often we leverage acoustic-based reasoning in our daily lives. In addition, for humans, combining modalities is a given. As such, it’s only natural to merge sound source localization and sound event detection with, for instance, speech processing techniques. It’s the same thing we do here in SRPOL. We combine multiple technologies to develop features that will redefine the experience with Samsung products and services.

For example, we believe that sound recognition is pivotal, and SRPOL ensures that our devices can “hear” and react to the environment as humans do. One of our first deployments in this field was an engine capable of detecting barking dogs within the Pet Care ecosystem developed by Digital Appliances (DA) Business. Moreover, with Mobile eXperience (MX) Business and Samsung Research (SR), we have equipped smartphones with a played content classification engine, which significantly improved the listening experience of mobile users. For True Wireless Stereo (TWS), we secured an Acoustic Scene Classification engine to make the Active Noise Cancelation feature truly smart. We also worked on a warning sound detection engine so that our customers can be warned of potential emergencies. Ultimately, Conversation Enhancement developed in SRPOL will not only boost speech and remove wind noises but also utilize multiple microphones and the latest AI techniques to ensure that speech intelligibility will be enhanced in real-life scenarios similar to the conditions in Suwon’s Samsung cafeteria during lunch!

How about TV? We have cooperated with Visual Display (VD) Business for many years. SRPOL ensures that every TV in Europe can recognize the type of set-top box connected to simplify device setup. We also extract as much valuable information from incoming audio as possible. Speaker classification combined with gender detection, voice activity detection, speech diarization, sleep detection, or audio-based sleep analysis are just examples of technology modules we develop in SRPOL, and these will be part of solutions that will redefine our customers’ audio journey with Samsung TVs, portable projectors, and beyond.

Q: Can you tell us about any main achievements, rewarding moments, or any significant events in your research?

Throughout my 15-year journey in AI research, there have been many exciting moments. It might sound trivial, but the most satisfying achievements are those that bring value both to Samsung and the consumers. I have already listed above some of our accomplishments that contributed to Samsung products, so let me go into what is even more vital for me as a leader and manager.

I genuinely believe our success is not about one achievement we made or contributed to. It is about enabling our people to achieve more, become experts, and unleash their potential, and this is what I consider my personal achievement. I have built a fantastic team of “audio freaks”—people who help each other and people who can face any internal or external challenge the world has to offer. Let me give you some examples.

The deployment of AI-based features is no easy task. We need to deal with consumer expectations, false positives, and challenging requirements from product owners. Furthermore, the development of a demo and prototype may be tough, but actual on-device implementation is an absolute nightmare. Therefore, the most rewarding moment is seeing a working AI-driven solution developed in SRPOL on Samsung Earbuds, Fridges, Robot Vacuum Cleaners, and so on. Only the unique combination of software (SW) developers, AI researchers, and embedded programmers will enable this magic!

The second example is a fresh one. Every year, I challenge my engineers to participate in external competitions. Real validation and benchmarking of our algorithms, as well as actual competition with leading research institutions and companies such as Baidu and Amazon, are truly unique experiences. For many years, we have been on the podium in the Detection and Classification of Acoustic Scenes and Events (DCASE) Challenge, which is a must-attend event, while working on sound recognition. However, the most rewarding moment is not when your engineers take third, second, or even first place; it is witnessing your people score great ranks, with the demo scenario and potential commercialization already considered. The biggest achievement isn’t just about one technology item you developed or contributed to. It is having a highly effective and passionate team that can realize any dream—both theirs and our customers’!

Q: What is your vision for the future, and what goal would you want to achieve?

Organization-wise, I would like to continue creating innovative and cross-domain teams that will increase the ratio of AI-based features within Samsung products. I truly believe that recent advancements in the AI domain must change the way we work. As such, we must work faster and experiment more, securing the appropriate playground and providing our people the space, time, and environment for groundbreaking experimentation.

On the technology side, large language models (LLMs) are vital, and I admire what they are capable of. I see immense potential in multimodal reasoning. The new era in which we combine audio, video, and text is happening right in front of us, and I look forward to new excellent features, especially in terms of audiovisual immersion and generative AI.

Lastly, let me share my personal goal as a sound engineer and audio freak who builds his own speakers and amplifiers and considers listening to music and consuming audiovisual shows the best entertainment ever. Spatial audio is coming. In the music and broadcasting industry, a big change lies ahead of us in the form of object-based audio. What would you say about having a Samsung Spatial Audio platform combined with fixed and movable projectors that change how we consider entertainment? Imagine having audiovisual objects flying around your living room, with outstanding visuals generated by the latest generative AI! If you’re interested in such a vision, let’s meet in SRPOL and discuss how we can make this happen!