[SR Talks] ① Interview with an Intelligent Software Expert at Samsung R&D Institute China–Nanjing

Q: Please briefly introduce yourself, Samsung R&D Institute China – Nanjing, and the kind of work that goes on there. What projects are you working on?

I’m Jie Chen, leading the Intelligent Software team here at Samsung R&D Institute China-Nanjing (SRC-Nanjing). At SRC-Nanjing, our focus spans platform technology, artificial intelligence (AI) technology, and services software. Particularly, in the last two years, we have actively contributed to the local TV and SmartThings businesses in China. We launched the Android on Tizen (AoT) TV in the Chinese market in August 2022 and anticipate the release of the new SmartThings this year. The initial launch will introduce 33 categories of things from 5 brands to Chinese end users on our SmartThings platform, along with more items and services that will be introduced gradually.

Personally, I’m working on research and development of AI solutions and applications, covering topics such as visual and sensor perception for understanding humans, home environments, and video contents. In addition, I contribute to content creation technologies, such as home three-dimensional (3D) map generation and avatar effects generation. I also work for TV Bixby for China TV, where we provided a cute local nickname for Bixby, “三星小贝,” which means “Samsung (三星) little(小) baby/treasure(贝).”

Q: Please tell me about the importance of your research field or technology (in terms of the need for technology and how it will change our lives in the future).

Our products are mainly consumer electronic (CE) devices. To provide user-friendly intelligent products, our devices must rapidly adapt to users’ homes and usage patterns. This drives our visual and sensor technology development to understand humans and their homes.

Like Samsung’s Ballie at the Consumer Electronics Show (CES) this year, if I were the end user, I would prefer it to offer a natural interaction similar to that of real human exchange. In this context, there are no remote controllers or buttons involved—only verbal communication and body movements. As such, we need to provide good technologies that utilize cameras to observe human actions and microphones to comprehend verbal commands. Human interaction is complex, diverse, and variable. The current technology level still has a big gap with ideal expectations. Therefore, we are doing our best to work on it and improve our current state to achieve our goal.

Another example is SmartThings. While our platform offers a multitude of features, the growing number of options adds complexity for users. People can easily identify the types of light, such as a bedroom light, but they can’t remember the light’s ID or serial number. This underscores the importance of map creation. With 3D maps, users can easily check and control device status similar to the physical environment. Because we want to streamline the user’s creation of a 3D map, we offer various creation methods, allowing users to select their location or create a hand-drawn home floor plan. In collaboration with DPC and DA, this year, we introduce a new creation method using Samsung’s robot vacuum cleaner. With the LiDAR map from the RVC, the 3D indoor map will be automatically created. However, we still face more challenges in enhancing the automatic recognition of objects and locations automatically, forming a key focus for our upcoming work.

We recognize that people are inherently visual creatures, with visual information comprising 70% of a person’s perception of the world. Simply put, visuals attract us and make the most impact. When users purchase TVs, the visual quality is the most important point, emphasizing high-quality display effects and visually appealing content. Hence, we are developing the technology to improve these aspects. In recently years, generative AI has experienced a rapid surge, showing great capabilities to craft innovative and high-quality content. By leveraging this, we can offer users personalized content creation on their TV and avenues for them to decorate their homes. While this technology is promising, a critical consideration is the associated cost. Training and inference costs on servers are significantly high. As such, SRC-Nanjing focuses more on on-device technology, presenting a tremendous challenge in relation to the substantial size of Gen.AI and the limitations of device resources. Nevertheless, we are working to overcome these challenges and deliver on-device solutions with lower costs.

Q: Can you tell us about significant events that stood out in your research, highlighting the main achievements or rewarding episodes?

Since our initiation of AI research at SRC-Nanjing in 2017, we primarily focused on commercializing AI applications. I still remember the first commercialized solution when I was the project leader, which was in 2018 when style transfer was a hot topic. We were then also developing and looking to apply it on TV. However, style transfer models are memory-consuming, especially considering 2K images. Moreover, the baseline method entails several minutes of interaction inference. We tried various methods and failed because of either the high time latency or poor image quality. At times, we thought that the results were promising and shared them with the headquarters (HQ). However, feedback remained negative because of issues such as a lack of vivid colors or the presence of abnormal textures. After several months of continued reading of papers and experimenting, one morning, a member reported achieving a breakthrough. I knew we made it upon seeing the image, which prompted me to share the photo with the HQ counterpart immediately. They were surprised and even asked, “What magic have you done?” Finally, during that year, we commercialized the solution to Samsung TV’s Ambient Mode, the first deep learning–based method commercialization, which made me proud. Initially, we failed, but we did not give up and succeeded at last.

Moving forward, since 2022, our goal centered on achieving excellence and top-tier technology. Beyond commercial success, which is predominantly engineering work, we strive to do research and make novel solutions that can set global benchmarks. Our first attempt to submit our work to conferences led to rejection. We also participated in AI challenges but failed to get good ranks. However, we eventually accumulated experience and refined our experiment. Last year, our paper on Unified Pyramid Recurrent Network (UPR-Net) for video frame interpolation was accepted at the Computer Vision and Pattern Recognition (CVPR) Conference, one of the best computer vision conferences in the world. We also won second place for domestic sound event detection in the Detection and Classification of Acoustic Scenes and Events (DCASE) challenge. Our contributions extended to providing novel solutions for TV, including a proposal about human detection from ultrasound for automatic screening muting. To integrate ultrasound, we conducted many demonstrations to highlight the value of our solution. Overcoming challenges related to HW constraints, managing resources related to other sound-related services, and addressing pet health concerns because dogs can hear higher frequencies compared to humans were vital aspects of our efforts. The benefit is also apparent: with the current HW, we can detect users with privacy considerations. With all our achievements, I’m proud of my team. We share a common goal, with all of us thinking hard and working smart.

Q: What is your vision for the future, and what goal would you want to achieve?

In the era of AI, I want to contribute to products and businesses. Despite the poor market conditions because of the challenges in the global economy, technological innovation can play its role. Technological innovation is a fundamental driver of economic growth and human progress. Hence, our goal is to research top-level AI technologies, utilize them on devices to offer product innovation, and create new values. It is crucial to view things from the user’s perspective and pursue the best of technology, which is much needed. I want to build an innovative team with passion and an open mind, and I hope to make great collaborations with HQ members and other overseas centers.