StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels From a Stereo Camera Using Deep Neural Networks

Hongyu Li1,2, Zhengang Li3*, Neset Unver Akmandor1,3*, Huaizu Jiang2, Yanzhi Wang3, Taskin Padir1,3
1Institute for Experiential Robotics, Northeastern Univerisity 2Khoury College of Computer Sciences, Northeastern University 3Department of Electrical and Computer Engineering, Northeastern University

StereoVoxelNet generates voxels from a stereo pair to represent the detected location of the obstacles at the range of 32 meters in a coarse-to-fine manner.


Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that leverages a deep neural network to detect occupancy from stereo images. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a Jackal robot which is used to fine-tune our model.


Qualitative Result


        title = {StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks},
        author = {Li, Hongyu and Li, Zhengang and Akmandor, Neset Unver and Jiang, Huaizu and Wang, Yanzhi and Padir, Taskin},
        booktitle={2023 IEEE International Conference on Robotics and Automation (ICRA)},