AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (9.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Vision-Language Model-Driven Human−Vehicle Interaction for Autonomous Driving: Status, Challenge, and Innovation

Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai 200092, China
School of Automotive Studies, Tongji University, Shanghai 200092, China
Trinity College, University of Oxford, Oxford OX1 3BH, UK
College of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China, and also with College of Electronic ahd Information Engineering, Tongji University, Shanghai 200092, China
Show Author Information

Abstract

This paper investigates the potential of Vision-Language Models (VLMs) to enhance Human–Vehicle Interaction (HVI) in Autonomous Driving (AD) scenarios, particularly in interactions between vehicles and other traffic participants, with a focus on rationality and safety in external HVI. Leveraging recent advancements in large language models, VLMs demonstrate remarkable capabilities in understanding real-world contexts and generating significant interest in HVI applications. This paper provides an overview of AD, HVI, and VLMs, along with the historical context of large language model applications in HVI. The HVI discussed herein involves dynamic game processes encompassing perception and decision-making between vehicles and traffic participants, such as pedestrians. Furthermore, we examine the perceptual challenges associated with applying VLMs to HVI and compile relevant datasets. This research fills a gap in the existing literature by systematically analyzing the current status, challenges, and future opportunities of VLM applications in HVI. To advance VLM integration in AD, various implementation strategies are discussed. The findings highlight the potential of VLMs to transform HVI in AD, improving both passenger experience and driving safety. Overall, this study contributes to a comprehensive understanding of VLM applications in HVI and provides insights to guide future research and development.

References

【1】
【1】
 
 
Big Data Mining and Analytics
Pages 425-447

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Zhao R, Du A, Cai M, et al. Vision-Language Model-Driven Human−Vehicle Interaction for Autonomous Driving: Status, Challenge, and Innovation. Big Data Mining and Analytics, 2026, 9(2): 425-447. https://doi.org/10.26599/BDMA.2025.9020090

1091

Views

106

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Received: 23 May 2024
Revised: 09 July 2025
Accepted: 24 July 2025
Published: 09 February 2026
© The author(s) 2026.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).