Learning Multi-Modal Scale-Aware Attentions for Efficient and Robust Road Segmentation

Yunjiao Zhou; Jianfei Yang; Haozhi Cao; Zhaoyang Zeng; Han Zou; Lihua Xie

doi:10.1142/S2301385024410048

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Learning Multi-Modal Scale-Aware Attentions for Efficient and Robust Road Segmentation

Yunjiao Zhou^{^*}

, Jianfei Yang^{^*}

(

), Haozhi Cao^{^*}

, Zhaoyang Zeng^{^†}, Han Zou^{^*}, Lihua Xie^{^*}

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

†

Computer Vision and Robot Research Center, International Digital Economy Academy, Shenzhen, Guangdong, P. R. China

This paper was recommended for publication in its revised form by Special Issue Editors: Jie Chen, Ben M. Chen and Jie Huang.

Show Author Information

Abstract

Road segmentation is essential to unmanned systems, contributing to road perception and navigation in the field of autonomous driving. While multi-modal road segmentation methods have shown promising results by leveraging the complementary data of RGB and Depth to provide robust 3D geometry information, existing methods suffer from severe efficiency problems that hinder their practical application in autonomous driving. Their direct concatenation of multi-modal features with a densely-connected network leads to increased semantic gaps among modalities and scales, causing high computational and time complexity. To address these issues, we propose a Multi-modal Scale-aware Attention Network (MSAN) to fuse RGB and Depth data effectively via a novel transformer-based cross-attention module, namely Multi-modal Scare-aware Transformer (MST), which fuses RGB-D features from a global perspective across multiple scales. To better consolidate different scales of features, we further propose a Scale-aware Attention Module (SAM) that captures channel-wise attention efficiently for cross-scale fusion. These two attention-based modules explore the complementarity of modalities and scales, narrowing the gaps and avoiding complex structures for road segmentation. Extensive experiments demonstrate MSAN achieves competitive performance at a low computational cost, suitable for real-time implementation on edge-devices in autonomous driving systems.

Keywords

Autonomous driving road segmentation multi-modal fusion cross attention scale-aware fusion real-time application

References

【1】

Crossref Google Scholar

Unmanned Systems

Volume 12 Issue 2,
March 2024

Pages 201-213

DOI: 10.1142/S2301385024410048

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Zhou Y, Yang J, Cao H, et al. Learning Multi-Modal Scale-Aware Attentions for Efficient and Robust Road Segmentation. Unmanned Systems, 2024, 12(2): 201-213. https://doi.org/10.1142/S2301385024410048

1068

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 13 August 2023

Revised: 24 October 2023

Accepted: 27 October 2023

Published: 07 December 2023