Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

Jing Liu; Lizong Zhang; Chenpeng Cao; Yinong Shi; Chong Mu; Jiaxin Li

doi:10.26599/BDMA.2025.9020032

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (1.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models

Jing Liu^¹, Lizong Zhang^¹(

), Chenpeng Cao^¹, Yinong Shi^¹, Chong Mu^¹, Jiaxin Li^²

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

2Miaozhendida (Beijing) Network Technology Co. Ltd., Beijing 100085, China

Show Author Information

Abstract

Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities. However, LLMs are commonly perceived as implicit knowledge bases, and their generative and in-context learning potential remains underutilized. Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts. In light of this, we propose Knowledge Generation with Frozen Language Models (KGFLM), a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA. Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order, thereby activating the frozen LLM to produce more useful knowledge statements for better predictions. The generated knowledge statements can also serve as interpretable rationales. In our method, the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question, without requiring additional annotations. Furthermore, a series of experiments are conducted on A-OKVQA and OKVQA datasets. The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.

Keywords

knowledge-based Visual Question Answering (VQA)zero-shot learning Large Language Models (LLMs)

References

【1】

Crossref Google Scholar

Big Data Mining and Analytics

Volume 8 Issue 6,
December 2025

Pages 1418-1431

DOI: 10.26599/BDMA.2025.9020032

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Liu J, Zhang L, Cao C, et al. Zero-Shot Knowledge-Based Visual Question Answering with Frozen Language Models. Big Data Mining and Analytics, 2025, 8(6): 1418-1431. https://doi.org/10.26599/BDMA.2025.9020032

1810

Views

182

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 17 December 2024

Revised: 14 February 2025

Accepted: 24 March 2025

Published: 19 September 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).