Leveraging Large Language Models to Enhance Medical Text Representation for Lung Diagnosis Prediction via Knowledge Infusion

Binyu Gao; Qiongye Dong; Tianqi Tao; Congmin Zhu; Jun Huang; Hui Chen; Qiuying Yang; Honglei Liu

doi:10.26599/TST.2024.9010153

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (5.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Leveraging Large Language Models to Enhance Medical Text Representation for Lung Diagnosis Prediction via Knowledge Infusion

Binyu Gao^¹, Qiongye Dong^², Tianqi Tao^³, Congmin Zhu^¹, Jun Huang^¹, Hui Chen^¹, Qiuying Yang^¹, Honglei Liu^¹(

)

1School of Biomedical Engineering, Capital Medical University, Beijing 100069, China

2Institute of Precision Medicine, Peking University Shenzhen Hospital, Shenzhen 518036, China

3Department of Geriatrics, The Second Medical Center and National Clinical Research Center for Geriatric Diseases, Chinese People’s Liberation Army General Hospital, Beijing 100853, China

Show Author Information

Abstract

Medical text representation is crucial for medical natural language processing (NLP) applications. Bidirectional encoder representations from transformers (BERT) has achieved the state-of-the-art performance in general domain text representation. However, limited by the design of the pretraining task and the frequency of knowledge occurrence, it lacks understanding of medical knowledge. To overcome these problems, we proposed a selective knowledge extraction and fusion framework to enhance medical text representation. In the knowledge extraction phase, we first designed a semantic importance evaluation metric to extract internal knowledge. We then used large language models (LLMs) to extract external knowledge from systematized nomenclature of medicine clinical term (SNOMED CT). In the knowledge fusion phase, we utilized an attention mechanism and Siamese network to integrate internal knowledge and external knowledge. Extracting knowledge through large language models (LLMs) and integrating it into five different types of BERT models, we achieved significant improvements in the task of pulmonary disease text classification.

Keywords

large language models (LLMs)medical text representation knowledge infusion aided diagnosis bidirectional encoder representations from transformers (BERT)

References

【1】

Crossref Google Scholar

Tsinghua Science and Technology

Volume 31 Issue 1,
February 2026

Pages 418-429

DOI: 10.26599/TST.2024.9010153

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Gao B, Dong Q, Tao T, et al. Leveraging Large Language Models to Enhance Medical Text Representation for Lung Diagnosis Prediction via Knowledge Infusion. Tsinghua Science and Technology, 2026, 31(1): 418-429. https://doi.org/10.26599/TST.2024.9010153

Part of a topical collection:

Special Section on Challenges and Opportunities in Biomedical Big Data Analysis: From LLM Models to Cl

2695

Views

657

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 12 July 2024

Accepted: 19 August 2024

Published: 25 August 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).