Samsung's Novel Flow-guided Video Frame Interpolation Technique

Video Frame Interpolation (VFI) is a classic low-level vision task, which aims to increase the frame rate of videos by synthesizing non-existent intermediate frames between consecutive frames. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) is one of the top-tier international computer vision conferences. A VFI paper (“A Unified Pyramid Recurrent Network for Video Frame Interpolation”) by Intelligent Vision Lab of Samsung R&D Institute China-Nanjing (SRC-Nanjing) has been recently accepted by CVPR 2023.

Intelligent Vision Lab of SRC-Nanjing

Intelligent Vision Lab (IVL) of SRC-Nanjing focuses on advanced computer vision technologies, including video frame interpolation, super resolution, style transfer, object detection, etc. We have published several papers on top conferences and deployed some computer vision algorithms on Samsung’s products. We will keep going and make more contributions to Samsung.

Main Contributions of This Paper

This paper (“A Unified ”) has the following main contributions.

1. It presents a novel compact model to simultaneously estimate the bi-directional motions between input frames. It is extremely lightweight (15x smaller than PWC-Net), yet enables reliable handling of large and complex motion cases.

2. Powered by proposed bi-directional motion estimator, a simple synthesis network is sufficient to predict the intermediate frame from forward-warped representations under various challenging conditions.

3. The proposed video frame interpolation method achieves excellent performance on a broad range of benchmarks, with much less parameters than state-of-the-art methods.

Proposed Unified Pipeline for Flow-guided Video Frame Interpolation

The macro structure of the proposed pipeline is a pyramid recurrent network. At each pyramid level, it leverages estimated bi-directional flow to generate forward-warped representations for frame synthesis; across pyramid levels, it enables iterative refinement for both optical flow and intermediate frame.

Quantitative Results

The proposed algorithm achieves excellent quantitative results (measured by PSNR and SSIM) on various video frame interpolation benchmarks. In particular, it has much less parameters than state-of-the-art methods and runs very fast.

Qualitative Results

The proposed algorithm is robust to large motion and complex non-linear motion cases. Compared to existing VFI methods, our UPR-Net gives better qualitative results on the “hard” subset of SNU-FILM benchmark, and the extremely high-resolution 4K1000FPS benchmark.