Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Seo Yi Chng; Paul J. W. Tern; Matthew R. X. Kan; Lionel T. E. Cheng

doi:10.1002/hcs2.40

Health Care Science 2023, 2(2): 120-128 https://doi.org/10.1002/hcs2.40

Review |

Open Access | Issue | Published: 24 April 2023

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Show Author's Information Hide Author's Information Seo Yi Chng^¹

(

), Paul J. W. Tern^², Matthew R. X. Kan^³, Lionel T. E. Cheng^⁴

Department of Paediatrics, National University of Singapore, Singapore, Singapore

Department of Cardiology, National Heart Centre, Singapore, Singapore

NUS High School of Mathematics and Science, Singapore, Singapore

Department of Diagnostic Radiology, Singapore General Hospital, Singapore, Singapore

Keywords:

natural language processing, machine learning, neural network, automated labelling, radiology

Cite this article:

Chng SY, Tern PJW, Kan MRX, et al. Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods. Health Care Science, 2023, 2(2): 120-128. https://doi.org/10.1002/hcs2.40

Download citation

EndNote(RIS)

BibTeX

415

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.

Full text

Abstract

Full text

Outline

About this article

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Show Author's information Hide Author's Information Seo Yi Chng^¹

(

), Paul J. W. Tern^², Matthew R. X. Kan^³, Lionel T. E. Cheng^⁴

Department of Paediatrics, National University of Singapore, Singapore, Singapore

Department of Cardiology, National Heart Centre, Singapore, Singapore

NUS High School of Mathematics and Science, Singapore, Singapore

Department of Diagnostic Radiology, Singapore General Hospital, Singapore, Singapore

Abstract

Keywords: natural language processing, machine learning, neural network, automated labelling, radiology

References(40)

Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1): 4–15. https://doi.org/10.1148/radiol.2020192224

DOI Google Scholar

Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S. Practical guide to natural language processing for radiology. Radiographics. 2021;41(5): 1446–53. https://doi.org/10.1148/rg.2021200113

DOI Google Scholar

Yadav A, Patel A, Shah M. A comprehensive review on resolving ambiguities in natural language processing. AI Open. 2021;2: 85–92. https://doi.org/10.1016/j.aiopen.2021.05.001

DOI Google Scholar

Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform. 2015;57: 28–37. https://doi.org/10.1016/j.jbi.2015.07.010

DOI Google Scholar

Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001: 105–9.

Google Scholar

Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR‐based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(1): 139–53. https://doi.org/10.1109/TCBB.2018.2849968

DOI Google Scholar

Bao Q, Ni L, Liu J. HHH: An online medical chatbot system based on knowledge graph and hierarchical bi‐directional attention. In: Proceedings of the Australasian Computer Science Week Multiconference. Melbourne, VIC, Australia: ACM; pp. 1–10. 2020. Available from: https://doi.org/10.1145/3373017.3373049

DOI

Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. arXiv. 2022;1. http://arxiv.org/abs/2212.13138

Johnson AEW, Pollard T, Mark R, Berkowitz S, Horng S. The MIMIC‐CXR Database [Internet] [cited January 25, 2023]. 2019. Available from: https://physionet.org/content/mimic-cxr/

Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C, et al. MIMIC‐CXR, a de‐identified publicly available database of chest radiographs with free‐text reports. Sci Data. 2019;6(1):317. https://doi.org/10.1038/s41597-019-0322-0

DOI Google Scholar

Demner‐Fushman D, Antani S, Simpson M, Thoma GR. Design and development of a multimodal biomedical information retrieval system. J Comput Sci Eng. 2012;6(2):168–77. https://doi.org/10.5626/JCSE.2012.6.2.168

DOI Google Scholar

Plank B. The ‘problem’ of human label variation: on ground truth in data, modeling and evaluation. arXiv preprint. 2022. arXiv: 2211.02570.

Mandivarapu JK, Camp B, Estrada R. Deep active learning via open‐set recognition. Front Artif Intell. 2022;5:737363. https://doi.org/10.3389/frai.2022.737363

DOI Google Scholar

Banerjee I, Li K, Seneviratne M, Ferrari M, Seto T, Brooks JD, et al. Weakly supervised natural language processing for assessing patient‐centered outcome following prostate cancer treatment. JAMIA Open. 2019;2(1):150–9. https://doi.org/10.1093/jamiaopen/ooy057

DOI Google Scholar

Agnikula Kshatriya BS, Sagheb E, Wi CI, Yoon J, Seol HY, Juhn Y, et al. Identification of asthma control factor in clinical notes using a hybrid deep learning model. BMC Med Inform Decis Mak. 2021;21(Suppl 7):272. https://doi.org/10.1186/s12911-021-01633-4

DOI Google Scholar

Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z. NegBio: a high‐performance tool for negation and uncertainty detection in radiology reports. arXiv. 2017;2. http://arxiv.org/abs/1712.05898

Asudani DS, Nagwani NK, Singh P. Impact of word embedding models on text analytics in deep learning environment: a review. Artif Intell Rev. 2023:1–81. Epub 2023 February 22. https://doi.org/10.1007/s10462-023-10419-1

DOI Google Scholar

Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. pp. 1532–43. 2014. Available from: https://aclanthology.org/D14-1162

DOI

Chen TL, Emerling M, Chaudhari GR, Chillakuru YR, Seo Y, Vu TH, et al. Domain specific word embeddings for natural language processing in radiology. J Biomed Inf. 2021;113:103665. https://doi.org/10.1016/j.jbi.2020.103665

DOI Google Scholar

Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre‐training of deep bidirectional transformers for language understanding. arXiv. 2019;2. http://arxiv.org/abs/1810.04805

Ranum DL. Knowledge‐based understanding of radiology text. Comput Methods Programs Biomed. 1989;30(2–3):209–15. https://doi.org/10.1016/0169-2607(89)90073-4

DOI Google Scholar

Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea‐Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc AAAI Conf Artif Intell. 2019;33(01):590–7. https://doi.org/10.1609/aaai.v33i01.3301590

DOI Google Scholar

Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv. 2020;3. http://arxiv.org/abs/2004.09167

DOI

Hicks SA, Strümke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022;12(1):5979. https://doi.org/10.1038/s41598-022-09954-8

DOI Google Scholar

Bogatinovski J, Todorovski L, Džeroski S, Kocev D. Comprehensive comparative study of multi‐label classification methods. Expert Syst. Appl. 2022;203. https://doi.org/10.1016/j.eswa.2022.117215

DOI Google Scholar

Schneider KM. Techniques for improving the performance of naive bayes for text classification. In: Gelbukh A editors. Computational linguistics and intelligent text processing. CICLing 2005. Lecture notes in computer science, vol 3406. Berlin, Heidelberg: Springer; 2005. https://doi.org/10.1007/978-3-540-30586-6_76

DOI

Sadman N, Tasneem S, Haque A, Islam MM, Ahsan MM, Gupta KD. Can NLP techniques be utilized as a reliable tool for medical science? Building a NLP framework to classify medical reports, 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2020, pp. 0159–0166. https://doi.org/10.1109/IEMCON51383.2020.9284834

DOI

Breiman L. Random forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324

DOI Google Scholar

Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic annotation of narrative radiology reports. Diagnostics. 2020;10(4):196. https://doi.org/10.3390/diagnostics10040196

DOI Google Scholar

Kokol P, Kokol M, Zagoranski S. Machine learning on small size samples: a synthetic knowledge synthesis. Sci Prog. 2022;105(1) https://doi.org/10.1177/00368504211029777

DOI Google Scholar

Schmidt Robin M. Recurrent neural networks (rnns): a gentle introduction and overview. arXiv preprint. 2019. arXiv: 1912.05911. https://doi.org/10.48550/arXiv.1912.05911

Ostmeyer J, Cowell L. Machine learning on sequential data using a recurrent weighted average. Neurocomputing. 2019;331:281–88. https://doi.org/10.1016/j.neucom.2018.11.066

DOI Google Scholar

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. 2017;5. http://arxiv.org/abs/1706.03762

Bressem KK, Adams LC, Gaudin RA, Tröltzsch D, Hamm B, Makowski MR, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre‐trained on 3.8 million text reports. Bioinformatics. 2021;36(21):5255–61. https://doi.org/10.1093/bioinformatics/btaa668

DOI Google Scholar

Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv. 2020;4. http://arxiv.org/abs/1910.01108

Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiol Artif intell. 2022;4(4):220007. https://doi.org/10.1148/ryai.220007

DOI Google Scholar

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv. 2019;1. http://arxiv.org/abs/1907.11692

Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain‐specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2022;3(1):1–23. https://doi.org/10.1145/3458754

DOI Google Scholar

Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, et al. RadBERT: adapting transformer‐based language models to radiology. Radiol Artif Intell. 2022;4(4):e210258. https://doi.org/10.1148/ryai.210258

DOI Google Scholar

Zhang Y, Liu M, Hu S, Shen Y, Lan J, Jiang B, et al. Development and multicenter validation of chest X‐ray radiography interpretations based on natural language processing. Commun Med. 2021;1:43. https://doi.org/10.1038/s43856-021-00043-x

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 23 August 2022

Accepted: 23 February 2023

Published: 24 April 2023

Issue date: April 2023

Copyright

Acknowledgements

ACKNOWLEDGEMENTS

The authors did not receive any external assistance in the drafting of this manuscript. The authors did not receive funding for this work.

Rights and permissions

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.