AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.8 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Miaozhendida (Beijing) Network Technology Co. Ltd., Beijing 100085, China
Show Author Information

Abstract

Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities. However, LLMs are commonly perceived as implicit knowledge bases, and their generative and in-context learning potential remains underutilized. Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts. In light of this, we propose Knowledge Generation with Frozen Language Models (KGFLM), a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA. Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order, thereby activating the frozen LLM to produce more useful knowledge statements for better predictions. The generated knowledge statements can also serve as interpretable rationales. In our method, the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question, without requiring additional annotations. Furthermore, a series of experiments are conducted on A-OKVQA and OKVQA datasets. The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.

References

【1】
【1】
 
 
Big Data Mining and Analytics
Pages 1418-1431

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Liu J, Zhang L, Cao C, et al. Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models. Big Data Mining and Analytics, 2025, 8(6): 1418-1431. https://doi.org/10.26599/BDMA.2025.9020032

1810

Views

182

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Received: 17 December 2024
Revised: 14 February 2025
Accepted: 24 March 2025
Published: 19 September 2025
© The author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).