Abstract
Monocular depth estimation is a critical component in understanding spatial relationships for various computer vision applications, including autonomous driving and augmented reality. However, accurate depth prediction remains challenging due to two primary factors: (1) the low pixel density of objects in distant regions and (2) the loss of essential features during the resolution reduction process in traditional encoder architectures. To address these challenges, this work introduces an innovative encoder-decoder architecture that incorporates uncertainty maps to improve feature extraction, particularly in long-distance regions. The proposed model utilizes auxiliary uncertainty networks to identify areas with high prediction difficulty, enabling the generation of more robust feature representations through hierarchical feature combinations. Additionally, the decoder architecture is designed to emphasize structural details by introducing an uncertainty edge weighting mask (UEWM) generation module, which further enhances depth prediction performance in challenging regions. Experimental results demonstrate that the proposed method significantly improves depth estimation accuracy in long-range scenarios, as evaluated on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) and Dense Depth for Autonomous Driving (DDAD) datasets. These findings highlight the potential of this uncertainty-aware monocular depth estimation approach for practical applications, including autonomous driving and robotic perception.
京公网安备11010802044758号
Comments on this article