Publications

Self-supervised Visual Odometry with Ego-Motion Sampling

Published

International Conference on Video, Signal and Image Processing (VSIP)

Date

2020.12.04

Abstract

In recent years, deep learning-based methods for monocular visual odometry have made good progress and now demonstrate state-of-the-art results on the well-known KITTI benchmark. However, collecting ground truth camera poses for training deep visual odometry models requires special equipment and thus might be difficult and expensive. To overcome this limitation, there have been proposed a number of unsupervised methods that exploit geometric relations between depth and motion. However, there is still a large gap in accuracy between unsupervised and supervised methods. In this work, we propose a simple method for generating self-supervision for visual odometry. During training, it requires dense depth maps and an approximate motion distribution of a target platform (e.g. a car or a robot). For each input frame, we sample camera motion from the given distribution, then using a depth map we compute an optical flow that corresponds to the sampled camera motion. Then, this generated optical flow serves as an input to a visual odometry model, while the sampled camera motion serves as a ground truth output.

Experiments on KITTI demonstrate that a deep visual odometry method trained in the proposed self-supervised manner outperforms unsupervised visual odometry methods, thus reducing the gap between the methods that do not require supervision and fully supervised methods. The source code is available on GitHub..