The Conference on Computer Vision and Pattern Recognition (CVPR) is a world-renowned conference in the field of artificial intelligence (AI) and machine learning. Samsung R&D Institute China - Beijing (SRC-B) achieved excellent results in the New Trends in Image Restoration and Enhancement (NTIRE) workshop and Action Learning From Realistic Environments and Directives (ALFRED) workshop at CVPR 2022. This represents new breakthroughs for SRC-B in the fields of image restoration and enhancement, as well as Embodied AI.
The NTIRE workshop has been the most influential global competition in the field of computer image restoration, enhancement, and manipulation in recent years. It aims to provide an overview of the new trends and advances in these areas.
SRC-B took part in and ranked third in Track 2 of the NTIRE 2022 Burst Super-Resolution challenge. The competition attracted a large number of followers and participants from the industry and academia, and over 14 teams participated in the final testing phase.
The NTIRE 2022 Burst Super-Resolution challenge aims to promote further research in the burst super-resolution task and establish the current state-of-the-art. The task required performing joint denoising, demosaicing, and super-resolution, which are fundamental steps in image signal processing (ISP). The challenge contained two tracks; Track 1 focused on synthetic bursts, while Track 2 employed real-world bursts. The difference between the two was that while Track 1 had an accurate pixel-wise ground truth for quantitative evaluation, Track 2 was ranked only using a human study, which made it more challenging.
Instead of only capturing a single picture, burst mode shooting captures multiple photos of a scene in quick succession. It also provides the possibility for combining information from multiple images to generate a single image of higher quality. The NTIRE 2022 Burst Super-Resolution challenge allows for a recovery of a higher resolution image of the scene by merging information from multiple low-resolution samplings.
The winning solution made important efforts in two areas. On the one hand, it helped narrow the domain gap between synthetic and real-world bursts, and on the other hand, it assisted in designing special loss functions to make the training process more stable and converge to the ideal effect.
Creating a robot capable of understanding human intentions has always been a long-term goal of the industry and academia. Recent developments in AI technologies, such as computer vision, natural language processing (NLP), and robotics, have encouraged researchers to explore this type of agent.
ALFRED is a new benchmark for learning mapping from natural language instructions and providing an egocentric vision to sequences of actions for household tasks. The ALFRED 2022 challenge required robots to understand human verbal instructions for completing household tasks by manipulating objects in the room. More than a dozen outstanding teams participated, and despite the fierce competition, SRC-B ranked third in the challenge, representing the top level and new breakthroughs in the Embodied AI field.
SRC-B is most interested in the in-home robot scenario because domestic robots must be able to effectively and accurately understand the user’s intentions and perform household activities, which must also be safe, controllable, and understandable for users.
Furthermore, it is never enough to drive the robot to interact with users by language understanding algorithm. SRC-B has also developed various interactive designs for such a robot, which can also use gestures and intelligent devices to interact, maximizing the advantages of its hardware.