AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

Multimodal Agent AI: A Survey of Recent Advances and Future Directions

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710061, China
School of Electronic and Information, Northwestern Polytechnical University, Xi’an 710072, China
Show Author Information

Abstract

In recent years, multimodal agent AI (MAA) has emerged as a pivotal area of research, holding promise for transforming human-machine interaction. Agent AI systems, capable of perceiving and responding to inputs from multiple modalities (e.g., language, vision, audio), have demonstrated remarkable progress in understanding complex environments and executing intricate tasks. This survey comprehensively reviews the state-of-the-art developments in MAA and examines its fundamental concepts, key techniques, and applications across diverse domains. We first introduce the basics of agent AI and its multimodal interaction capabilities. We then delve into the core technologies that enable agents to perform task planning, decision-making, and multi-sensory fusion. Furthermore, we focus on exploring various applications of MAA in robotics, healthcare, gaming, and beyond. Additionally, we mainly focus on analyzing the challenges and limitations of current systems and propose promising research directions for future improvements, including human-AI collaboration, online learning method improvement. By reviewing existing work and highlighting open questions, this survey aims to provide a comprehensive roadmap for researchers and practitioners in the field of MAA.

Electronic Supplementary Material

Download File(s)
JCST-2409-14802-Highlights.pdf (382.8 KB)

References

【1】
【1】
 
 
Journal of Computer Science and Technology
Pages 1046-1063

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Sun Y-Z, Sun H-L, Ma J-C, et al. Multimodal Agent AI: A Survey of Recent Advances and Future Directions. Journal of Computer Science and Technology, 2025, 40(4): 1046-1063. https://doi.org/10.1007/s11390-025-4802-8

1577

Views

15

Crossref

11

Web of Science

11

Scopus

0

CSCD

Received: 06 September 2024
Accepted: 27 April 2025
Published: 30 August 2025
© Institute of Computing Technology, Chinese Academy of Sciences 2025