Object residual constrained Visual-Inertial Odometry

Mo Shan

Vikas Dhiman

Qiaojun Feng

Jinzhao Li

Nikolay Atanasov

Department of Electrical and Computer Engineering
University of California, San Diego
IROS, 2020

Overview

Introduction of OrcVIO.

Animations

This animation shows color-coded object level tracks of semantic keypoints, and green tracks of geometric features.

This animation shows the 2D IOU of bounding-boxes from annotation and those detected by YOLO. In the label, id means our object id, while gt means id in annotation.

This animation shows the reprojected objects. The object state is reprojected on the image, where object detection is the blue rectangle, object shape is the red wireframe, and the green ellipse is the reprojection of the ellipsoid that we use to represent objects.

Presentation

IROS 2020 short version

IROS 2020 long version

[Slides]

Chinese version

[Slides in Chinese]

More results for OrcVIO

Semantic keypoint detection

Our approach uses StarMap for semantic keypoint detection. As could be observed in the upper row in Figure below, it could handle a certain degree of viewpoint, scale, and visibility variation, since StarMap uses a large training set to prevent overfitting. Nonetheless, the lower row shows some failure cases due to occlusion or instance variation. Wrong detections or too few detections will cause troubles in our approach.

Semantic keypoint detection from starmap.

Keypoint detection covariance

We use Monte Carlo Dropout to obtain the semantic keypoint covariances. Figure below shows how we insert the Dropout layer into the Starmap network and the average covariance obtained from a sampled KITTI dataset.

Semantic keypoint uncertainty obtained from approximate Bayesian inference through the stacked hourglass convolutional neural network.

Below is a closer view of the keypoint covariance on one car.

Semantic keypoint uncertainty on one car.

Detection and tracking on KITTI

The front end could work with both colored images or grayscale images. Below is an exmaple of using color images as input.

Object state reprojection on KITTI odometry sequences

KITTI odometry 06

The bottom left window shows the object tracking for semantic keypoints and bounding boxes. The right shows the trajectory estimation and object mapping.

KITTI raw data 09 26 0117

The top left window shows the semantic keypoints tracking, while the bottom left window shows the geometric features tracking. The right window shows the trajectory estimation and object mapping.

Forest scene

This demo shows the peformance of OrcVIO in a forest using a RealSense sensor. The red line represents the estimated trajectory of OrcVIO. The purple ellispoid is the covariance of the pose.

Indoor scene with chairs and monitors

This demo shows the construction of an object map for the lab scene with chairs and monitors, using a RealSense sensor. The red line is the estimated trajectory, and the axes mark the current pose. The black dots are the reconstructed geometric landmarks, whereas the green dots are the estimated semantic keypoints. The blue ellipsoids are the chairs and the orange ellipsoids are the monitors mapped by OrcVIO.

Outdoor scene at UCSD campus

This demo shows the construction of an object map for the outdoor scene with chairs, bikes, and cars, using an INDEMIND sensor. The red line is the estimated trajectory, and the axes mark the current pose. The green dots are the estimated semantic keypoints. The blue ellipsoids are the chairs mapped by OrcVIO, whereas the red ellipsoids are the bikes, and the black ellipsoids are the cars.

Object map with 40 cars in Unity simulator

Upper row: Unity simulation scene. Lower row: reconstructed objects, where the orange line is the estimated trajectory, the green ellipsoids are the reconstructed cars, and the blue meshes are the groundtruth car positions.

Object map with car and door categories in Unity simulator

We propose a tightly coupled visual-inertial odometry and object state optimization algorithm. (a) A simulated scene from Unity, where a quadrotor flies over cars and doors. (b) Color-coded semantic keypoint tracklets on cars and doors. (c) Estimated trajectory (green) that coincides with the groundtruth trajectory (red), and the object map with reconstructed cars (green ellipsoids), doors (red ellispoids), semantic keypoints (yellow spheres), and the groundtruth objects (blue meshes).

Demo with cars, doors, and barriers in Unity simulator

More results for OrcVIO Lite

KITTI odometry 06

OrcVIO Lite uses bounding box only and no semantic keypoints, more suitable for real time experiments. The test on KITTI odometry 06 uses grayscale images for both front end and back end. The red line is the estimated trajectory, while the purple ellipsoid is the covariance of the pose. The white points are the geometric landmarks, the colored dots are the active features. The black spheres are the reconstructed cars.

Flea3 camera

OrcVIO Lite uses bounding box only and no semantic keypoints, more suitable for real time experiments. This test uses grayscale images from a Flea3 camera. The red line is the estimated trajectory, while the purple ellipsoid is the covariance of the pose. The white points are the geometric landmarks, the colored dots are the active features. The black spheres are the reconstructed cars.

Jackal robot

OrcVIO Lite runs on a Jackal robot, which is equipped with a RealSense sensor. The yellow path is the VIO trajectory, whereas the red ellipsoids are the detected barrels. The marker size for barrels are exagerated for illustration purpose. The barrel detection and tracking results are shown on the top left, where the bounding boxes show detections and the lines are the tracklets of the bounding boxes. The tracklets are very long since the Jackal has a large viewpoint change. In this case the SORT tracker is modified to use centroid distance for affinity instead of IOU. Due to the low quality IMU, the inertial data is not reliable and there is significant drift. The constant stop also makes it challenging for VIO. Despite those difficulties, OrcVIO Lite is still able to localize the robot and map the barrels.

More results for OrcVIO Stereo

EuRoC V1 01

OrcVIO Stereo uses stereo camera instead of monocular camera to increase robustness. This demo shows its performance on EuRoC V1 01 sequence. The red line is ground-truth trajectory, while the blue line is the estimated trajectory.

EuRoC MH 01

This demo shows the performance of OrcVIO Stereo on EuRoC MH 01 sequence. The red line is ground-truth trajectory, while the blue line is the estimated trajectory.

This demo shows the performance of OrcVIO Stereo Python version on EuRoC MH 01 sequence. The green line is ground-truth trajectory, while the black line is the estimated trajectory.

Jackal robot

OrcVIO Stereo runs on a Jackal robot, which is equipped with a RealSense sensor. The small linear acceleration due to constant velocity and limited angular velocity make this senario very challenging. A drift could be noticed when the Jackal runs over uneven terrain, e.g. curb.

OrcVIO Stereo runs on a Jackal robot, with more features, in a loopy trajectory. The accuracy is increased because there are more features, as can be seen in the completed loops with very small drifts. Due to small drift the point clouds are also reconstructed well, for instance the corridors can be clearly seen.

Racecar

OrcVIO Stereo runs on a racecar, which is equipped with a RealSense D435i, in the lab.

OrcVIO Stereo runs on a racecar, which is equipped with a RealSense D435i. The mapping module maps the chairs in the lab.

Publication

        @inproceedings{shan2020orcvio,
          title={OrcVIO: Object residual constrained Visual-Inertial Odometry},
          author={Shan, Mo and Feng, Qiaojun and Atanasov, Nikolay},
          booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
          pages={5104--5111},
          year={2020},
          organization={IEEE}
        }

[IROS version]

[Journal version]

Codebase

We have made public different flavours of OrcVIO, include mono, stereo, mapping, mapping-lite, etc, as depicted in the summary above.

[OrcVIO Demo in C++]

[OrcVIO Demo in Python]

Acknowledgements

We gratefully acknowledge support from ARL DCIST CRA W911NF-17- 2-0181.

This webpage template was borrowed from https://akanazawa.github.io/cmr/.

The acronym of our method is inspired by Warcraft III.

QR code generated from https://www.qrcode-monkey.com/.