Open Source

A Soft Introduction to Visual SLAM

By Alexey Merzlyakov Samsung R&D Institute Russia
By Steven Macenski Samsung Research America

Introduction

It is no longer a secret that everyone meets robots in everyday life. Robots could be found everywhere: from children’s toys and to autonomous vehicles. Robots help us to live by coping with everyday routine tasks, entertaining us, and giving us new abilities. Some of the most useful service robots are human assistant robots performing distant, repetitive and tedious jobs. Wide variety of examples include package delivery robots, robot-waiters in restaurants, assistant robots in public places, housework robots supplied with manipulators, mobile helper and interactive robots, robot-avatars and much more. Even vacuum cleaners and window cleaning robots used in everyday life are also service robots. It could be said that service robots are moving almost everywhere. But what makes these robots move?

ROS & Nav2 Stack Fallback

The most popular operating system for robots in the world for today – is a ROS (Robot Operating System), currently in a second revision. The part of this system which is responsible for robot movement – is Navigation2 stack (a.k.a. Nav2). This is a whole stack of modules responsible for defining robot behavior in any given situation, path planning, actuating the robot wheels and much more.

SLAM

One of the key tasks here - obtaining robot position in space to get the robot an understanding where it is; and building a map of the environment where the robot is going to move. These tasks are being resolved by one Simultaneous Localization and Mapping module called SLAM. Classic SLAM approaches typically use laser range sensors to localize robot position in the space with only a 2D slice of the world.

Problematic

The main problem of these approaches is the use of expensive laser scanners (lidars) for each robot unit. As robot costs are being driven down to enable large-scale sales to consumers, the cost of laser scanners has become a potential bottleneck. To avoid this problem it was supposed to replace expensive lidars for cheaper cameras. In this case robust and mature approaches relying on cameras and comparatively low cost RGB-D sensors will be crucial for enabling this next wave of robot applications. Let’s check out modern and interesting approaches for further research of robot localization and environment mapping based on Computer Vision, which is called Visual SLAM or just V-SLAM.

VSLAM

What happens to a human when he finds himself in an unknown area? One can detect static objects like tables, chairs, carpets, TV-s, etc… Relative to these objects human can already assess his position in space:

Figure 1.  Example - detection of static object in a daily life image source

Robots operate in a similar way. However, despite the human it “sees” in the space not global objects (at least for conventional V-SLAM approaches), but rather point objects: points with unique surroundings. These points are called “point features”. For that it uses one, but better two cameras. The latest choice is obviously better for localization accuracy in space. Using two cameras installed on the robot allows it to know each point features position in the space. By simple geometry, it is not difficult to calculate robot position:

Figure 2.  Object detection using VSLAM from robot.

Also, all point features could be built into one combined map, which resolves the mapping task for SLAM.

Paper Description

Moving on. We’ve started to be curious, which V-SLAM approach is best suitable for the role of universal Visual SLAM approach in service robotics? For that we’ve made a comparison of most popular Open Source V-SLAM approaches existing in the world. In this work, we selected among this set 3 modern, robust, and feature rich Visual SLAM techniques: ORB-SLAM3, OpenVSLAM and RTABMap. These selected techniques were compared with maximum coverage of various domains applied for service robotics:

● Drone operating in small indoors: facility and office (KITTI dataset)

Figure 3.  KITTI dataset – facility and office

● Wheeled platform operating in large indoors warehouse (TUM RGB-D dataset)

Figure 4.  TUM RGB-D dataset – large indoors warehouse

● Car-based platform operating in outdoors: streets of city and suburban areas (EuRoC dataset)

Figure 5.  EuRoC dataset – streets of city and suburban areas

The goal of this analysis was to identify robust, multi-domain visual SLAM options which may be suitable replacements for 2D SLAM for a broad class of service robot applications. This comparison is additionally novel by offering a benchmark using 3 datasets and 3 recent V-SLAM techniques that have not been formally compared before: ORB-SLAM3 and OpenVSLAM were both run against at least one new dataset that has not been published in literature previously (TUM RGB-D). The comparison is also made unique through the focus on general purpose techniques that perform well across different domains and sensor configurations. These techniques are not only analyzed on their core algorithm performance on datasets, but also their features and reliability to assess their ability to take the place of lidar-based SLAM solutions.

Conclusions

By the results of comparison benchmark, it was concluded that:
   ● Through this experimentation, it was concluded that OpenVSLAM was the overall best general purpose technique for the broadest range of service robot types, environments, and sensors
   ● However, ORB-SLAM3 with IMU sensor fusion showed excellent performance in poor conditions. Thus incorporating IMU fusion techniques is typically preferred when it possible

Practical Results

OpenVSLAM has been integrated into the ROS 2 Nav2 System (a.k.a. ROS 2 Navigation) as an alternative SLAM and localization system parallel to 2D SLAM Toolbox and AMCL localization. In integration experiments, the robot moved inside a small warehouse 3D-simulated environment. In parallel it was localizing its position in the map using V-SLAM algorithm and building the map to operate on.

The results of V-SLAM work orchestrated with Nav2 stack are summarized at the video below:
https://youtu.be/XK-EMZvYuWc

We suppose that results of this evaluation and our experiments will further be used to impact and accelerate the use of V-SLAM for robotics in commercial products.

Link to the paper

https://arxiv.org/abs/2107.07589

Reference

Full comparison article can be found by the following link: https://arxiv.org/abs/2107.07589.

Hope you were enjoyed by this article,

If you will have any questions, you can contact us directly: Alexey Merzlyakov (alexey.merzlyakov@samsung.com) / Steven Macenski (s.macenski@samsung.com)