[Meet Samsung Research Tech Leaders] ② Interview with an AI Vision Expert at Samsung R&D Institute China-Nanjing (SRC-N)



Q1. Can you please briefly introduce yourself, SRC-N, and the kind of work that goes on there? What project are you working on? 

SRC-N was established in 2004. It mainly contributes to the software (S/W) development of smartphones, televisions (TVs), refrigerators, and other electronic products whose vision is to “Create Innovative Values with World-Best S/W Technology.” The best software technology needs advanced and superior commercialization technologies. SRC-N’s strong technologies include on-device artificial intelligence (AI) vision, data intelligence, TV software platforms and service, and smartphone system and game solutions. 

I am the Head of Intelligent Vision Lab at SRC-N, where I can contribute to the on-device AI vision research and commercialization domain. Currently, as the Head of Intelligent Vision Lab, my projects are related to the environment and human understanding, vision content creation, picture quality enhancement, etc.


Q2. Please tell me the importance of your research field or technology. 

Nowadays, screens have become increasingly bigger, and 4K/8K high dynamic range (HDR) TVs have come into our life. Because of this, several of our competitors exerted effort to improve picture quality for users, such as movies, photos, and games. With the improvement of AI technologies, devices start to understand the user and the environment and provide high-quality content. 

To run deep learning–based algorithms on a device, a neural processing unit (NPU) and digital signal processor (DSP) are deployed on several TVs or smartphone systems on a chip (SOCs). Our field not only focuses on the algorithm with high-quality output but also the vertical solution in which the algorithms can run on-device NPU/DSP to provide real-time functionalities to the user. 

My research field on algorithms includes picture quality improvement, human and environment understanding, and content creation, and my goal is to commercialize all our developed algorithms.

Because of bandwidth limitations, some content remains at low resolutions bitrates, and frame rates. Good picture quality can give strong competitiveness to Samsung products because it directly affects the user experience. To apply the correct enhancement solution, we need a method to evaluate the picture quality. Currently, there are several convolutional neural network (CNN)–based or transformer-based algorithms, which are focused on the picture quality assessment domain. Besides our success in the academic world, we need to develop an on-device solution with acceptable performance that can be commercialized. For low-quality content, we need to develop proper on-device enhancement algorithms, such as denoising, super resolution, HDR, and low light shot enhancement.

Most content comes from providers or users’ photos. Content generation solutions can enrich content to bring fun to our customers. Based on a user’s photos, we have solutions to generate new photos with different styles, and we can also generate animated pictures from the user’s 2D photo input. In the content creation domain, we have commercialized several solutions, which are incorporated into our TVs. 

Health is one of the hot topics on many electronic devices. Because of the threat of COVID-19 or other environmental issues, it is not convenient for our customers to go to gyms. Luckily, the “Smart Trainer” on Samsung TVs gives people an opportunity to be fit even at home. Smart Trainer can guide them to do exercises correctly and make a workout routine for them. To achieve such a goal, we commercialized “AI Coaching” and deployed our human understanding methods on-device, which successfully contributed to the commercialization of AI Coaching in our products, such as TVs and smartphones.

Environment understanding techniques, such as light environment map estimation, light position estimation, and object shadow detection, play an important role in augmented reality (AR) scenarios. The information predicted by these techniques can be utilized to generate a virtual object in AR. We have developed a real-time, deep learning–based, on-device light estimation and shadow detection, which can detect the position of indoor or outdoor lights. Using information predicted by deep learning, we also developed a realistic rendering algorithm for virtual objects. 


Q3. Can you tell me about the main achievement and a rewarding moment in your research?

Our research field is on-device AI vision, and it has two domains. One is AI vision algorithm development, and the other is on-device model deployment. Since 2017, we have been focused on deep learning model deployment and on-device inference acceleration. We developed a central processing unit (CPU)–based inference acceleration solution and have gotten the state-of-the-art acceleration inference speed in standard deep learning networks, such as AlexNet, VGGNet, and ResNet. Our CPU-based inference acceleration solution and on-device deployment framework have been our contribution to the AI scenario commercialization on TV and the “TV AI Framework,” respectively. 

In layman’s terms, our algorithms focus on the user’s face, body, and actions. The algorithms can be used for video conferences and in Samsung’s Smart Trainer. The core module of Smart Trainer is to understand users’ fitness activities. Based on the successful human understanding algorithm on-device deployment and commercialization, we can guide customers to exercise well in their own homes. We achieved this goal through deep learning–based vision algorithms and on-device NPU deployment. Our paper, “Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction,” was accepted in the Institute of Electrical and Electronics Engineers (IEEE) transactions on image processing, which has the state-of-the-art face detection and reconstruction that could be used for face recognition, content generation, AI beauty, video conference avatar rendering, etc.

Understanding environmental techniques is the cornerstone for AR. Our main achievement is the on-device light estimation and realistic rendering. Using vision information, we can predict the light source in an environment and use this information in rendering virtual objects. In this research, we can predict the light source’s position, which is essential for a virtual object’s shadow rendering. We have the state-of-the-art level in the realistic rendering field, and one of our papers was accepted in Science China Information Sciences. 

We also have significant achievements in the field of state-of-the-art algorithm research. In object detection and classification research, a paper named “Automatic Detection and Classification System of Domestic Waste via Multimodal Cascaded Convolutional Neural Network” was accepted in the IEEE Transactions on Industrial Informatics in 2021. In picture quality enhancement research, another paper about motion deblurring algorithms was accepted in the IEEE Transactions on Image Processing in the same year. 

Besides numerous academic achievements, we won second place in “Video Object Detection” at the Association for Computing Machinery (ACM) Multimedia 2020 Grand Challenge.


Q4. What is your vision for the future and the goal you want to achieve?

SRC-N’s vision is to “Create Innovative Values with World-Best S/W Technology.” To help SRC-N achieve this vision, our lab’s vision is to “Provide World Leading On-Device AI Vision Solutions to Samsung’s Products.” 

In the future, we will continue to maintain our already commercialized AI vision. We will also innovate using new AI-related scenarios. Based on our strong technique points, we will focus on two fields. One is state-of-the-art AI vision algorithm research, which can give high-quality output, and another is on-device deployment, which can commercialize high-quality algorithms in our products.