1314
Views
149
Downloads
0
Crossref
N/A
WoS
N/A
Scopus
N/A
CSCD
Lightweight modules play a key role in 3D object detection tasks for autonomous driving, which are necessary for the application of 3D object detectors. At present, research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate. However, building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem. In this paper, we focus on combining convolutional neural networks with self-attention-based vision transformers to realize lightweight and high-speed computing for 3D object detection. We propose light-weight detection 3D (LWD-3D), which is a point cloud conversion and lightweight vision transformer for autonomous driving. LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data, which provides a new feature representation method based on a vision transformer for 3D detection applications. The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection (time per image < 20 ms). LWD-3D obtains a mean average precision (mAP) 75% higher than that of another 3D real-time detector with half the number of parameters. Our research extends the application of visual transformers to 3D object detection tasks.
Lightweight modules play a key role in 3D object detection tasks for autonomous driving, which are necessary for the application of 3D object detectors. At present, research still focuses on constructing complex models and calculations to improve the detection precision at the expense of the running rate. However, building a lightweight model to learn the global features from point cloud data for 3D object detection is a significant problem. In this paper, we focus on combining convolutional neural networks with self-attention-based vision transformers to realize lightweight and high-speed computing for 3D object detection. We propose light-weight detection 3D (LWD-3D), which is a point cloud conversion and lightweight vision transformer for autonomous driving. LWD-3D utilizes a one-shot regression framework in 2D space and generates a 3D object bounding box from point cloud data, which provides a new feature representation method based on a vision transformer for 3D detection applications. The results of experiment on the KITTI 3D dataset show that LWD-3D achieves real-time detection (time per image < 20 ms). LWD-3D obtains a mean average precision (mAP) 75% higher than that of another 3D real-time detector with half the number of parameters. Our research extends the application of visual transformers to 3D object detection tasks.
X. Chen, K. Kundu, Y. Zhu, H. Ma, S. Fidler, and R. Urtasun, 3D object proposals using stereo imagery for accurate object class detection, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp. 1259–1272, 2018.
Y. Yan, Y. Mao, and B. Li, SECOND: Sparsely embedded convolutional detection, Sensors, vol. 18, no. 10, p. 3337, 2018.
This work was partially supported by the National Natural Science Foundation of China (No. 62206237), Japan Science Promotion Society (Nos. 22K12093 and 22K12094), and Japan Science and Technology Agency (No. JPMJST2281).
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).