The paper published by Samsung R&D Institute China–Beijing (SRC-B) has been recently accepted by the Institute of Electrical and Electronics Engineers (IEEE) International Conference on Multimedia and Expo (ICME) 2022. This paper explores the vision transformer’s potential in the dual-pixel image defocus deblurring field and how it can achieve the best performance.
The paper’s authors
The paper is from SRC-B’s Intelligent Camera Solution (ICS) team, which takes the commercial aspect as the core objective and the advanced technology research as the core tenet. ICS is also dedicated to improving Samsung products’ market competitiveness. Many achievements have been successfully used commercially in Samsung’s flagship products and recognized by the headquarters several times. Our team members have published many papers in excellent journals or computer vision conferences, such as the Conference on Computer Vision and Pattern Recognition (CVPR), ICME, etc. They also won first place in CVPR, the European Conference on Computer Vision (ECCV), and other prestigious international competitions.
Dynamic Multi-Scale Network for Dual-Pixel Images Defocus Deblurring with Transformer
This study proposed a dynamic multi-scale network named “DMTNet” for dual-pixel image defocus deblurring. DMTNet’s feature extraction module comprises several vision transformer blocks that use their powerful feature extraction capability to create robust features. The reconstruction module includes several dynamic multi-scale sub-reconstruction modules (DMSSRM). DMSSRM restores images by adaptively assigning weights to features from different scales according to the blur distribution and the input images’ content information. DMTNet combines the advantages of a transformer and convolution neural network (CNN), in which the vision transformer improves the CNN’s performance ceiling. CNN’s inductive bias enables the transformer to develop more robust features without relying on large data.
DMTNet’s proposed overall architecture