Intelligent and Converged Networks 2023, 4(1): 41-49 https://doi.org/10.23919/ICN.2023.0004

Open Access | Issue | Published: 20 March 2023

Emma: An accurate, efficient, and multi-modality strategy for autonomous vehicle angle prediction

Show Author's Information Hide Author's Information Keqi Song^¹, Tao Ni^¹, Linqi Song^¹, Weitao Xu^¹(

)

1Shenzhen Research Institute, City University of Hong Kong, Hong Kong, China, and also with the Department of Computer Science, City University of Hong Kong, Hong Kong, China

Keywords:

multi-modality, autonomous driving, vehicle angle prediction, few-shot learning

Cite this article:

Song K, Ni T, Song L, et al. Emma: An accurate, efficient, and multi-modality strategy for autonomous vehicle angle prediction. Intelligent and Converged Networks, 2023, 4(1): 41-49. https://doi.org/10.23919/ICN.2023.0004

Download citation

EndNote(RIS)

BibTeX

500

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

Autonomous driving and self-driving vehicles have become the most popular selection for customers for their convenience. Vehicle angle prediction is one of the most prevalent topics in the autonomous driving industry, that is, realizing real-time vehicle angle prediction. However, existing methods of vehicle angle prediction utilize only single-modal data to achieve model prediction, such as images captured by the camera, which limits the performance and efficiency of the prediction system. In this paper, we present Emma, a novel vehicle angle prediction strategy that achieves multi-modal prediction and is more efficient. Specifically, Emma exploits both images and inertial measurement unit (IMU) signals with a fusion network for multi-modal data fusion and vehicle angle prediction. Moreover, we design and implement a few-shot learning module in Emma for fast domain adaptation to varied scenarios (e.g., different vehicle models). Evaluation results demonstrate that Emma achieves overall 97.5% accuracy in predicting three vehicle angle parameters (yaw, pitch, and roll), which outperforms traditional single-modalities by approximately 16.7%–36.8%. Additionally, the few-shot learning module presents promising adaptive ability and shows overall 79.8% and 88.3% accuracy in 5-shot and 10-shot settings, respectively. Finally, empirical results show that Emma reduces energy consumption by 39.7% when running on the Arduino UNO board.

Full text

Abstract

Full text

Outline

About this article

Emma: An accurate, efficient, and multi-modality strategy for autonomous vehicle angle prediction

Show Author's information Hide Author's Information Keqi Song^¹, Tao Ni^¹, Linqi Song^¹, Weitao Xu^¹(

)

1Shenzhen Research Institute, City University of Hong Kong, Hong Kong, China, and also with the Department of Computer Science, City University of Hong Kong, Hong Kong, China

Abstract

Keywords: multi-modality, autonomous driving, vehicle angle prediction, few-shot learning

References(34)

[1]

M. Carlier, Electric vehicles worldwide—Statistics and facts, https://www.statista.com/topics/1010/electric-mobility/\#dossierKeyfigures, 2023.

[2]

N. Marinello, M. Proesmans, and L. V. Gool, TripletTrack: 3D object tracking using triplet embeddings and LSTM, in Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 4499–4509.

DOI

[3]

C. Luo, X. Yang, and A. Yuille, Self-supervised pillar motion learning for autonomous driving, in Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 3182–3191.

DOI

[4]

H. Qiu, F. Ahmad, F. Bai, M. Gruteser, and R. Govindan, AVR: Augmented vehicular reality, in Proc. 16^th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 2018, pp. 81–95.

DOI

[5]

G. P. Meyer, A. Laddha, E. Kee, C. Vallespi-Gonzalez, and C. K. Wellington, LaserNet: An efficient probabilistic 3D object detector for autonomous driving, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 12669–12678.

DOI

[6]

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34^th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.

[7]

W. Huang, W. Li, L. Tang, X. Zhu, and B. Zou, A deep learning framework for accurate vehicle yaw angle estimation from a monocular camera based on part arrangement, Sensors, vol. 22, no. 20, p. 8027, 2022.

DOI Google Scholar

[8]

Q. Khan, P. Wenzel, and D. Cremers, Self-supervised steering angle prediction for vehicle control using visual odometry, in Proc. 24^th International Conference on Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 3781–3789.

[9]

D. Roy, Y. Li, T. Jian, P. Tian, K. R. Chowdhury, and S. Ioannidis, Multi-modality sensing and data fusion for multi-vehicle detection, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3145663.

DOI

[10]

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, Multi-view 3D object detection network for autonomous driving, in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6526–6534.

DOI

[11]

Q. Pan, J. Wei, H. Cao, N. Li, and H. Liu, Improved DS acoustic–seismic modality fusion for ground-moving target classification in wireless sensor networks, Pattern Recognition Letters, vol. 28, no. 16, pp. 2419–2426, 2007.

DOI Google Scholar

[12]

T. Gong, Y. Kim, J. Shin, and S. -J. Lee, MetaSense: Few-shot adaptation to untrained conditions in deep mobile sensing, in Proc. 17^th Conference on Embedded Networked Sensor Systems (SenSys), New York, NY, USA, 2019, pp. 110–123.

DOI

[13]

J. Zhang, Z. Chen, C. Luo, B. Wei, S. S. Kanhere, and J. Li, MetaGanFi: Cross-domain unseen individual identification using WiFi signals, Proceedings of the ACM on Interactive,Mobile,Wearable and Ubiquitous Technologies, vol. 6, no. 3, pp. 1–21, 2022.

DOI Google Scholar

[14]

G. Yin, J. Zhang, G. Shen, and Y. Chen, FewSense, towards a scalable and cross-domain Wi-Fi sensing system using few-shot learning, IEEE Transactions on Mobile Computing, doi: 10.1109/TMC.2022.3221902.

DOI

[15]

R. Xiao, J. Liu, J. Han, and K. Ren, OneFi: One-shot recognition for unseen gesture via COTS WiFi, in Proc. 19^th ACM Conference on Embedded Networked Sensor Systems (SenSys), Coimbra, Portugal, 2021, pp. 206–219.

DOI

[16]

G. Lan, B. Heit, T. Scargill, and M. Gorlatova, GazeGraph: Graph-based few-shot cognitive context sensing from human visual behavior, in Proc. 18^th Conference on Embedded Networked Sensor Systems (SenSys), Virtual Event, 2020, pp. 422–435.

DOI

[17]

P. Sirinam, N. Mathews, M. S. Rahman, and M. Wright, Triplet fingerprinting: More practical and portable website fingerprinting with n-shot learning, in Proc. 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS), London, UK, 2019, pp. 1131–1148.

DOI

[18]

M. Chen, Y. Wang, H. Xu, and X. Zhu, Few-shot website fingerprinting attack, Computer Networks, vol. 198, p. 108298, 2021.

DOI Google Scholar

[19]

C. Wang, J. Dani, X. Li, X. Jia, and B. Wang, Adaptive fingerprinting: Website fingerprinting over few encrypted traffic, in Proc. 11^th ACM Conference on Data and Application Security and Privacy, Virtual Event, 2021, pp. 149–160.

DOI

[20]

X. Chu, L. Chen, and W. Yu, NAFSSR: Stereo image super-resolution using NAFNet, in Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 1238–1247.

DOI

[21]

Y. Cao, A. Dhekne, and M. Ammar, ITrackU: Tracking a pen-like instrument via UWB-IMU fusion, in Proc. 19^th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, WI, USA, 2021, pp. 453–466.

DOI

[22]

J. Chen, P. Jönsson, M. Tamura, Z. Gu, B. Matsushita, and L. Eklundh, A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky-Golay filter, Remote sensing of Environment, vol. 91, nos. 3&4, pp. 332–344, 2004.

DOI Google Scholar

[23]

P. Cronin, X. Gao, C. Yang, and H. Wang, Charger-Surfing: Exploiting a power line side-channel for smartphone information leakage, in Proc. 30^th USENIX Security Symposium, Boston, MA, USA, 2021, pp. 681–698.

[24]

R. Ning, C. Wang, C. Xin, J. Li, and H. Wu, DeepMag: Sniffing mobile apps in magnetic field through deep convolutional neural networks, in Proc. 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom), Athens, Greece, 2018, pp. 1–10.

DOI

[25]

A. S. L. Cour, K. K. Afridi, and G. E. Suh, Wireless charging power side-channel attacks, in Proc. 2021 ACM SIGSAC Conference on Computer and Communications Security (CCS), Virtual Event, 2021, pp. 651–665.

DOI

[26]

H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller, Deep learning for time series classification: A review, Data mining and knowledge discovery, vol. 33, no. 4, pp. 917–963, 2019.

DOI Google Scholar

[27]

F. -J. Wu and G. Solmaz, CrowdEstimator: Approximating crowd sizes with multi-modal data for Internet-of-Things services, in Proc. 16^th Annual International Conference on Mobile Systems, Applications, and Services, Munich, Germany, 2018, pp. 337–349.

[28]

Y. Soneda, Y. Matsuda, Y. Arakawa, and K. Yasumoto, M3B corpus: Multi-modal meeting behavior corpus for group meeting assessment, in Adjunct Proc. 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proc. 2019 ACM International Symposium on Wearable Computers, London, UK, 2019, pp. 825–834.

DOI

[29]

G. Zhao, G. Ben-Yosef, J. Qiu, Y. Zhao, P. Janakaraj, S. Boppana, and A. R. Schnore, Person re-ID testbed with multi-modal sensors, in Proc. 19^th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 2021, pp. 526–531.

DOI

[30]

T. Li, J. Huang, E. Risinger, and D. Ganesan, Low-latency speculative inference on distributed multi-modal data streams, in Proc. 19^th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, 2021, pp. 67–80.

DOI

[31]

C. Zhu, K. Li, Q. Lv, L. Shang, and R. P. Dick, IScope: Personalized multi-modality image search for mobile devices, in Proc. 7^th International Conference on Mobile Systems, Applications, and Services, Krakow, Poland, 2009, pp. 277–290.

DOI

[32]

D. Li, J. Xu, Z. Yang, Q. Zhang, Q. Ma, L. Zhang, and P. Chen, Motion inspires notion: Self-supervised visual-LiDAR fusion for environment depth estimation, in Proc. 20^th International Conference on Mobile Systems, Applications, and Services, Portland, OR, USA, 2022, pp. 114–127.

DOI

[33]

A. Pandey and D. Wang, A new framework for CNN-based speech enhancement in the time domain, IEEE/ACM Transactions on Audio,Speech,and Language Processing, vol. 27, no. 7, pp. 1179–1188, 2019.

DOI Google Scholar

[34]

T. Liu, M. Gao, F. Lin, C. Wang, Z. Ba, J. Han, W. Xu, and K. Ren, Wavoice: A noise-resistant multi-modal speech recognition system fusing mmWave and audio signals, in Proc. 19^th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 2021, pp. 97–110.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 13 February 2023

Revised: 11 March 2023

Accepted: 28 March 2023

Published: 20 March 2023

Issue date: March 2023

Copyright

Acknowledgements

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 62101471) and partially supported by the Shenzhen Research Institute of City University of Hong Kong, the Research Grants Council of the Hong Kong Special Administrative Region, China (No. CityU 21201420), Shenzhen Science and Technology Funding Fundamental Research Program (No. 2021Szvup126), National Natural Science Foundation of Shandong Province (No. ZR2021LZH010), Changsha International and Regional Science and Technology Cooperation Program (No. kh2201023), Chow Sang Sang Group Research Fund sponsored by Chow Sang Sang Holdings International Limited (No. 9229062), CityU MFPRC (No. 9680333), CityU SIRG (No. 7020057), CityU APRC (No. 9610485), CityU ARG (No. 9667225), and CityU SRG-Fd (No. 7005666).

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/