Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Knowledge-based Visual Question Answering (VQA) is a challenging task that requires models to access external knowledge for reasoning. Large Language Models (LLMs) have recently been employed for zero-shot knowledge-based VQA due to their inherent knowledge storage and in-context learning capabilities. However, LLMs are commonly perceived as implicit knowledge bases, and their generative and in-context learning potential remains underutilized. Existing works demonstrate that the performance of in-context learning strongly depends on the quality and order of demonstrations in prompts. In light of this, we propose Knowledge Generation with Frozen Language Models (KGFLM), a novel method for generating explicit knowledge statements to improve zero-shot knowledge-based VQA. Our knowledge generation strategy aims to identify effective demonstrations and determine their optimal order, thereby activating the frozen LLM to produce more useful knowledge statements for better predictions. The generated knowledge statements can also serve as interpretable rationales. In our method, the selection and arrangement of demonstrations are based on semantic similarity and quality of demonstrations for each question, without requiring additional annotations. Furthermore, a series of experiments are conducted on A-OKVQA and OKVQA datasets. The results show that our method outperforms some superior zero-shot knowledge-based VQA methods.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Comments on this article