Journal Home > Volume 4 , Issue 1

Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience. Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry, that is, realizing real-time vehicle angle prediction. However, existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction, such as images captured by the camera, which limits the performance and efficiency of the prediction system. In this paper, we present Emma, a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient. Specifically, Emma exploits both images and inertial measurement unit (IMU) signals with a fusion network for multi-modal data fusion and vehicle angle prediction. Moreover, we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios (e.g., different vehicle models). Evaluation results demonstrate that Emma achieves overall 97.5% accuracy in predicting three vehicle angle parameters (yaw, pitch, and roll), which outperforms traditional single-modalities by approximately 16.7%–36.8%. Additionally, the few-shot learning module presents promising adaptive ability and shows overall 79.8% and 88.3% accuracy in 5-shot and 10-shot settings, respectively. Finally, empirical results show that Emma reduces energy consumption by 39.7% when running on the Arduino UNO board.


menu
Abstract
Full text
Outline
About this article

Emma: An accurate, efficient, and multi-modality strategy for autonomous vehicle angle prediction

Show Author's information Keqi Song1Tao Ni1Linqi Song1Weitao Xu1( )
Shenzhen Research Institute, City University of Hong Kong, Hong Kong, China, and also with the Department of Computer Science, City University of Hong Kong, Hong Kong, China

Abstract

Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience. Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry, that is, realizing real-time vehicle angle prediction. However, existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction, such as images captured by the camera, which limits the performance and efficiency of the prediction system. In this paper, we present Emma, a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient. Specifically, Emma exploits both images and inertial measurement unit (IMU) signals with a fusion network for multi-modal data fusion and vehicle angle prediction. Moreover, we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios (e.g., different vehicle models). Evaluation results demonstrate that Emma achieves overall 97.5% accuracy in predicting three vehicle angle parameters (yaw, pitch, and roll), which outperforms traditional single-modalities by approximately 16.7%–36.8%. Additionally, the few-shot learning module presents promising adaptive ability and shows overall 79.8% and 88.3% accuracy in 5-shot and 10-shot settings, respectively. Finally, empirical results show that Emma reduces energy consumption by 39.7% when running on the Arduino UNO board.

Keywords: multi-modality, autonomous driving, vehicle angle prediction, few-shot learning

References(34)

[1]
M. Carlier, Electric vehicles worldwide—Statistics and facts, https://www.statista.com/topics/1010/electric-mobility/\#dossierKeyfigures, 2023.
[2]
N. Marinello, M. Proesmans, and L. V. Gool, TripletTrack: 3D object tracking using triplet embeddings and LSTM, in Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 4499–4509.
DOI
[3]
C. Luo, X. Yang, and A. Yuille, Self-supervised pillar motion learning for autonomous driving, in Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 3182–3191.
DOI
[4]
H. Qiu, F. Ahmad, F. Bai, M. Gruteser, and R. Govindan, AVR: Augmented vehicular reality, in Proc. 16th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 2018, pp. 81–95.
DOI
[5]
G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. K. Wellington, LaserNet: An efficient probabilistic 3D object detector for autonomous driving, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 12669–12678.
DOI
[6]
C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.
[7]

W. Huang, W. Li, L. Tang, X. Zhu, and B. Zou, A deep learning framework for accurate vehicle yaw angle estimation from a monocular camera based on part arrangement, Sensors, vol. 22, no. 20, p. 8027, 2022.

[8]
Q. Khan, P. Wenzel, and D. Cremers, Self-supervised steering angle prediction for vehicle control using visual odometry, in Proc. 24th International Conference on Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 3781–3789.
[9]
D. Roy, Y. Li, T. Jian, P. Tian, K. R. Chowdhury, and S. Ioannidis, Multi-modality sensing and data fusion for multi-vehicle detection, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3145663.
DOI
[10]
X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-view 3D object detection network for autonomous driving, in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6526–6534.
DOI
[11]

Q. Pan, J. Wei, H. Cao, N. Li, and H. Liu, Improved DS acoustic–seismic modality fusion for ground-moving target classification in wireless sensor networks, Pattern Recognition Letters, vol. 28, no. 16, pp. 2419–2426, 2007.

[12]
T. Gong, Y. Kim, J. Shin, and S. -J. Lee, MetaSense: Few-shot adaptation to untrained conditions in deep mobile sensing, in Proc. 17th Conference on Embedded Networked Sensor Systems (SenSys), New York, NY, USA, 2019, pp. 110–123.
DOI
[13]

J. Zhang, Z. Chen, C. Luo, B. Wei, S. S. Kanhere, and J. Li, MetaGanFi: Cross-domain unseen individual identification using WiFi signals, Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies, vol. 6, no. 3, pp. 1–21, 2022.

[14]
G. Yin, J. Zhang, G. Shen, and Y. Chen, FewSense, towards a scalable and cross-domain Wi-Fi sensing system using few-shot learning, IEEE Transactions on Mobile Computing, doi: 10.1109/TMC.2022.3221902.
DOI
[15]
R. Xiao, J. Liu, J. Han, and K. Ren, OneFi: One-shot recognition for unseen gesture via COTS WiFi, in Proc. 19th ACM Conference on Embedded Networked Sensor Systems (SenSys), Coimbra, Portugal, 2021, pp. 206–219.
DOI
[16]
G. Lan, B. Heit, T. Scargill, and M. Gorlatova, GazeGraph: Graph-based few-shot cognitive context sensing from human visual behavior, in Proc. 18th Conference on Embedded Networked Sensor Systems (SenSys), Virtual Event, 2020, pp. 422–435.
DOI
[17]
P. Sirinam, N. Mathews, M. S. Rahman, and M. Wright, Triplet fingerprinting: More practical and portable website fingerprinting with n-shot learning, in Proc. 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), London, UK, 2019, pp. 1131–1148.
DOI
[18]

M. Chen, Y. Wang, H. Xu, and X. Zhu, Few-shot website fingerprinting attack, Computer Networks, vol. 198, p. 108298, 2021.

[19]
C. Wang, J. Dani, X. Li, X. Jia, and B. Wang, Adaptive fingerprinting: Website fingerprinting over few encrypted traffic, in Proc. 11th ACM Conference on Data and Application Security and Privacy, Virtual Event, 2021, pp. 149–160.
DOI
[20]
X. Chu, L. Chen, and W. Yu, NAFSSR: Stereo image super-resolution using NAFNet, in Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 1238–1247.
DOI
[21]
Y. Cao, A. Dhekne, and M. Ammar, ITrackU: Tracking a pen-like instrument via UWB-IMU fusion, in Proc. 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, WI, USA, 2021, pp. 453–466.
DOI
[22]

J. Chen, P. Jönsson, M. Tamura, Z. Gu, B. Matsushita, and L. Eklundh, A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky-Golay filter, Remote sensing of Environment, vol. 91, nos. 3&4, pp. 332–344, 2004.

[23]
P. Cronin, X. Gao, C. Yang, and H. Wang, Charger-Surfing: Exploiting a power line side-channel for smartphone information leakage, in Proc. 30th USENIX Security Symposium, Boston, MA, USA, 2021, pp. 681–698.
[24]
R. Ning, C. Wang, C. Xin, J. Li, and H. Wu, DeepMag: Sniffing mobile apps in magnetic field through deep convolutional neural networks, in Proc. 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom), Athens, Greece, 2018, pp. 1–10.
DOI
[25]
A. S. L. Cour, K. K. Afridi, and G. E. Suh, Wireless charging power side-channel attacks, in Proc. 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS), Virtual Event, 2021, pp. 651–665.
DOI
[26]

H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller, Deep learning for time series classification: A review, Data mining and knowledge discovery, vol. 33, no. 4, pp. 917–963, 2019.

[27]
F. -J. Wu and G. Solmaz, CrowdEstimator: Approximating crowd sizes with multi-modal data for Internet-of-Things services, in Proc. 16th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 2018, pp. 337–349.
[28]
Y. Soneda, Y. Matsuda, Y. Arakawa, and K. Yasumoto, M3B corpus: Multi-modal meeting behavior corpus for group meeting assessment, in Adjunct Proc. 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proc. 2019 ACM International Symposium on Wearable Computers, London, UK, 2019, pp. 825–834.
DOI
[29]
G. Zhao, G. Ben-Yosef, J. Qiu, Y. Zhao, P. Janakaraj, S. Boppana, and A. R. Schnore, Person re-ID testbed with multi-modal sensors, in Proc. 19th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 2021, pp. 526–531.
DOI
[30]
T. Li, J. Huang, E. Risinger, and D. Ganesan, Low-latency speculative inference on distributed multi-modal data streams, in Proc. 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, 2021, pp. 67–80.
DOI
[31]
C. Zhu, K. Li, Q. Lv, L. Shang, and R. P. Dick, IScope: Personalized multi-modality image search for mobile devices, in Proc. 7th International Conference on Mobile Systems, Applications, and Services, Krakow, Poland, 2009, pp. 277–290.
DOI
[32]
D. Li, J. Xu, Z. Yang, Q. Zhang, Q. Ma, L. Zhang, and P. Chen, Motion inspires notion: Self-supervised visual-LiDAR fusion for environment depth estimation, in Proc. 20th International Conference on Mobile Systems, Applications, and Services, Portland, OR, USA, 2022, pp. 114–127.
DOI
[33]

A. Pandey and D. Wang, A new framework for CNN-based speech enhancement in the time domain, IEEE/ACM Transactions on Audio,Speech,and Language Processing, vol. 27, no. 7, pp. 1179–1188, 2019.

[34]
T. Liu, M. Gao, F. Lin, C. Wang, Z. Ba, J. Han, W. Xu, and K. Ren, Wavoice: A noise-resistant multi-modal speech recognition system fusing mmWave and audio signals, in Proc. 19th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 2021, pp. 97–110.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 13 February 2023
Revised: 11 March 2023
Accepted: 28 March 2023
Published: 20 March 2023
Issue date: March 2023

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP.

Acknowledgements

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 62101471) and partially supported by the Shenzhen Research Institute of City University of Hong Kong, the Research Grants Council of the Hong Kong Special Administrative Region, China (No. CityU 21201420), Shenzhen Science and Technology Funding Fundamental Research Program (No. 2021Szvup126), National Natural Science Foundation of Shandong Province (No. ZR2021LZH010), Changsha International and Regional Science and Technology Cooperation Program (No. kh2201023), Chow Sang Sang Group Research Fund sponsored by Chow Sang Sang Holdings International Limited (No. 9229062), CityU MFPRC (No. 9680333), CityU SIRG (No. 7020057), CityU APRC (No. 9610485), CityU ARG (No. 9667225), and CityU SRG-Fd (No. 7005666).

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/

Return