Vision-Language Model-Driven Human−Vehicle Interaction for Autonomous Driving: Status, Challenge, and Innovation

Rongfeng Zhao; Aimin Du; Mobing Cai; Zhongpan Zhu; Bin He

doi:10.26599/BDMA.2025.9020090

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (9.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Vision-Language Model-Driven Human−Vehicle Interaction for Autonomous Driving: Status, Challenge, and Innovation

Rongfeng Zhao^¹, Aimin Du^², Mobing Cai^³, Zhongpan Zhu^⁴(

), Bin He^¹

1Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai 200092, China

2School of Automotive Studies, Tongji University, Shanghai 200092, China

3Trinity College, University of Oxford, Oxford OX1 3BH, UK

4College of Mechanical Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China, and also with College of Electronic ahd Information Engineering, Tongji University, Shanghai 200092, China

Show Author Information

Abstract

This paper investigates the potential of Vision-Language Models (VLMs) to enhance Human–Vehicle Interaction (HVI) in Autonomous Driving (AD) scenarios, particularly in interactions between vehicles and other traffic participants, with a focus on rationality and safety in external HVI. Leveraging recent advancements in large language models, VLMs demonstrate remarkable capabilities in understanding real-world contexts and generating significant interest in HVI applications. This paper provides an overview of AD, HVI, and VLMs, along with the historical context of large language model applications in HVI. The HVI discussed herein involves dynamic game processes encompassing perception and decision-making between vehicles and traffic participants, such as pedestrians. Furthermore, we examine the perceptual challenges associated with applying VLMs to HVI and compile relevant datasets. This research fills a gap in the existing literature by systematically analyzing the current status, challenges, and future opportunities of VLM applications in HVI. To advance VLM integration in AD, various implementation strategies are discussed. The findings highlight the potential of VLMs to transform HVI in AD, improving both passenger experience and driving safety. Overall, this study contributes to a comprehensive understanding of VLM applications in HVI and provides insights to guide future research and development.

Keywords

Human−Vehicle Interaction (HVI)Large Language Model (LLM)Vision-Language large Model (VLM)Autonomous Driving (AD)perception technology

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 9 Issue 2,
April 2026

Pages 425-447

DOI: 10.26599/BDMA.2025.9020090

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Zhao R, Du A, Cai M, et al. Vision-Language Model-Driven Human−Vehicle Interaction for Autonomous Driving: Status, Challenge, and Innovation. Big Data Mining and Analytics, 2026, 9(2): 425-447. https://doi.org/10.26599/BDMA.2025.9020090

1091

Views

106

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 23 May 2024

Revised: 09 July 2025

Accepted: 24 July 2025

Published: 09 February 2026

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).