Multi-view deep learning for consistent semantic mapping with RGB-D cameras

Ma L, Stuckler J, Kerl C, Cremers D (2017)


Publication Type: Conference contribution

Publication year: 2017

Journal

Publisher: Institute of Electrical and Electronics Engineers Inc.

Book Volume: 2017-September

Pages Range: 598-605

Conference Proceedings Title: IEEE International Conference on Intelligent Robots and Systems

Event location: Vancouver, BC, CAN

ISBN: 9781538626825

DOI: 10.1109/IROS.2017.8202213

Abstract

Visual scene understanding is an important capability that enables robots to purposefully act in their environment. In this paper, we propose a novel deep neural network approach to predict semantic segmentation from RGB-D sequences. The key innovation is to train our network to predict multi-view consistent semantics in a self-supervised way. At test time, its semantics predictions can be fused more consistently in semantic keyframe maps than predictions of a network trained on individual views. We base our network architecture on a recent single-view deep learning approach to RGB and depth fusion for semantic object-class segmentation and enhance it with multi-scale loss minimization. We obtain the camera trajectory using RGB-D SLAM and warp the predictions of RGB-D images into ground-truth annotated frames in order to enforce multi-view consistency during training. At test time, predictions from multiple views are fused into keyframes. We propose and analyze several methods for enforcing multi-view consistency during training and testing. We evaluate the benefit of multi-view consistency training and demonstrate that pooling of deep features and fusion over multiple views outperforms single-view baselines on the NYUDv2 benchmark for semantic segmentation. Our end-to-end trained network achieves state-of-the-art performance on the NYUDv2 dataset in single-view segmentation as well as multi-view semantic fusion.

Involved external institutions

How to cite

APA:

Ma, L., Stuckler, J., Kerl, C., & Cremers, D. (2017). Multi-view deep learning for consistent semantic mapping with RGB-D cameras. In IEEE International Conference on Intelligent Robots and Systems (pp. 598-605). Vancouver, BC, CAN: Institute of Electrical and Electronics Engineers Inc..

MLA:

Ma, Lingni, et al. "Multi-view deep learning for consistent semantic mapping with RGB-D cameras." Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2017, Vancouver, BC, CAN Institute of Electrical and Electronics Engineers Inc., 2017. 598-605.

BibTeX: Download