AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Prompting Is Not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
BOSS Zhipin Career Science Lab, Beijing 100028, China
Show Author Information

Abstract

Recently, with the rapid advancements in Large Language Models (LLMs), LLM-based Open-domain Question Answering (OpenQA) methods have reaped the benefits of emergent understanding and answering capabilities enabled by massive parameters compared to traditional methods. However, most of these methods encounter two critical challenges: how to integrate knowledge into LLMs effectively and how to adaptively generate results with specific answer formats. To address these challenges, we propose a novel framework, which aims to improve the OpenQA performance by exploring knowledge integration and controllable generation on LLMs simultaneously, namely GenKI. Specifically, we first train a dense passage retrieval model to retrieve associated knowledge from a given knowledge base. Subsequently, we introduce a novel knowledge integration model that incorporates the retrieval knowledge into instructions during fine-tuning to intensify the model. Furthermore, to enable controllable generation in LLMs, we leverage a certain fine-tuned LLM and an ensemble framework based on text consistency incorporating all coherence, fluency, and answer format assurance. Finally, extensive experiments conducted on three datasets with diverse answer formats demonstrate the effectiveness of GenKI with comparison of state-of-the-art baselines. Moreover, ablation studies have disclosed a linear relationship between the frequency of retrieved knowledge and the model’s ability to recall knowledge accurately with the ground truth. Tests focusing on the out-of-domain scenario and knowledge base independence scenario have further affirmed the robustness and controllable capability of GenKI. Our code of GenKI is available at https://github.com/USTC-StarTeam/GenKI.

References

【1】
【1】
 
 
Big Data Mining and Analytics
Pages 563-579

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Shen T, Wang H, Qin C, et al. Prompting Is Not Enough: Exploring Knowledge Integration and Controllable Generation on Large Language Models. Big Data Mining and Analytics, 2026, 9(2): 563-579. https://doi.org/10.26599/BDMA.2025.9020052

910

Views

58

Downloads

1

Crossref

0

Web of Science

0

Scopus

0

CSCD

Received: 30 June 2024
Revised: 23 October 2024
Accepted: 30 April 2025
Published: 09 February 2026
© The author(s) 2026.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).