Scholar - SciOpen

Recently, with the rapid advancements in Large Language Models (LLMs), LLM-based Open-domain Question Answering (OpenQA) methods have reaped the benefits of emergent understanding and answering capabilities enabled by massive parameters compared to traditional methods. However, most of these methods encounter two critical challenges: how to integrate knowledge into LLMs effectively and how to adaptively generate results with specific answer formats. To address these challenges, we propose a novel framework, which aims to improve the OpenQA performance by exploring knowledge integration and controllable generation on LLMs simultaneously, namely GenKI. Specifically, we first train a dense passage retrieval model to retrieve associated knowledge from a given knowledge base. Subsequently, we introduce a novel knowledge integration model that incorporates the retrieval knowledge into instructions during fine-tuning to intensify the model. Furthermore, to enable controllable generation in LLMs, we leverage a certain fine-tuned LLM and an ensemble framework based on text consistency incorporating all coherence, fluency, and answer format assurance. Finally, extensive experiments conducted on three datasets with diverse answer formats demonstrate the effectiveness of GenKI with comparison of state-of-the-art baselines. Moreover, ablation studies have disclosed a linear relationship between the frequency of retrieved knowledge and the model’s ability to recall knowledge accurately with the ground truth. Tests focusing on the out-of-domain scenario and knowledge base independence scenario have further affirmed the robustness and controllable capability of GenKI. Our code of GenKI is available at https://github.com/USTC-StarTeam/GenKI.