Journal Home > Volume 2 , Issue 2

Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.


menu
Abstract
Full text
Outline
About this article

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

Show Author's information Seo Yi Chng1 ( )Paul J. W. Tern2Matthew R. X. Kan3Lionel T. E. Cheng4
Department of Paediatrics, National University of Singapore, Singapore, Singapore
Department of Cardiology, National Heart Centre, Singapore, Singapore
NUS High School of Mathematics and Science, Singapore, Singapore
Department of Diagnostic Radiology, Singapore General Hospital, Singapore, Singapore

Abstract

Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.

Keywords: natural language processing, machine learning, neural network, automated labelling, radiology

References(40)

1

Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1): 4–15. https://doi.org/10.1148/radiol.2020192224

2

Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S. Practical guide to natural language processing for radiology. Radiographics. 2021;41(5): 1446–53. https://doi.org/10.1148/rg.2021200113

3

Yadav A, Patel A, Shah M. A comprehensive review on resolving ambiguities in natural language processing. AI Open. 2021;2: 85–92. https://doi.org/10.1016/j.aiopen.2021.05.001

4

Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform. 2015;57: 28–37. https://doi.org/10.1016/j.jbi.2015.07.010

5

Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001: 105–9.

6

Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR‐based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(1): 139–53. https://doi.org/10.1109/TCBB.2018.2849968

7
Bao Q, Ni L, Liu J. HHH: An online medical chatbot system based on knowledge graph and hierarchical bi‐directional attention. In: Proceedings of the Australasian Computer Science Week Multiconference. Melbourne, VIC, Australia: ACM; pp. 1–10. 2020. Available from: https://doi.org/10.1145/3373017.3373049
DOI
8
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. arXiv. 2022;1. http://arxiv.org/abs/2212.13138
9
Johnson AEW, Pollard T, Mark R, Berkowitz S, Horng S. The MIMIC‐CXR Database [Internet] [cited January 25, 2023]. 2019. Available from: https://physionet.org/content/mimic-cxr/
10

Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C, et al. MIMIC‐CXR, a de‐identified publicly available database of chest radiographs with free‐text reports. Sci Data. 2019;6(1):317. https://doi.org/10.1038/s41597-019-0322-0

11

Demner‐Fushman D, Antani S, Simpson M, Thoma GR. Design and development of a multimodal biomedical information retrieval system. J Comput Sci Eng. 2012;6(2):168–77. https://doi.org/10.5626/JCSE.2012.6.2.168

12
Plank B. The ‘problem’ of human label variation: on ground truth in data, modeling and evaluation. arXiv preprint. 2022. arXiv: 2211.02570.
13

Mandivarapu JK, Camp B, Estrada R. Deep active learning via open‐set recognition. Front Artif Intell. 2022;5:737363. https://doi.org/10.3389/frai.2022.737363

14

Banerjee I, Li K, Seneviratne M, Ferrari M, Seto T, Brooks JD, et al. Weakly supervised natural language processing for assessing patient‐centered outcome following prostate cancer treatment. JAMIA Open. 2019;2(1):150–9. https://doi.org/10.1093/jamiaopen/ooy057

15

Agnikula Kshatriya BS, Sagheb E, Wi CI, Yoon J, Seol HY, Juhn Y, et al. Identification of asthma control factor in clinical notes using a hybrid deep learning model. BMC Med Inform Decis Mak. 2021;21(Suppl 7):272. https://doi.org/10.1186/s12911-021-01633-4

16
Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z. NegBio: a high‐performance tool for negation and uncertainty detection in radiology reports. arXiv. 2017;2. http://arxiv.org/abs/1712.05898
17

Asudani DS, Nagwani NK, Singh P. Impact of word embedding models on text analytics in deep learning environment: a review. Artif Intell Rev. 2023:1–81. Epub 2023 February 22. https://doi.org/10.1007/s10462-023-10419-1

18
Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. pp. 1532–43. 2014. Available from: https://aclanthology.org/D14-1162
DOI
19

Chen TL, Emerling M, Chaudhari GR, Chillakuru YR, Seo Y, Vu TH, et al. Domain specific word embeddings for natural language processing in radiology. J Biomed Inf. 2021;113:103665. https://doi.org/10.1016/j.jbi.2020.103665

20
Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre‐training of deep bidirectional transformers for language understanding. arXiv. 2019;2. http://arxiv.org/abs/1810.04805
21

Ranum DL. Knowledge‐based understanding of radiology text. Comput Methods Programs Biomed. 1989;30(2–3):209–15. https://doi.org/10.1016/0169-2607(89)90073-4

22

Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea‐Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc AAAI Conf Artif Intell. 2019;33(01):590–7. https://doi.org/10.1609/aaai.v33i01.3301590

23
Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv. 2020;3. http://arxiv.org/abs/2004.09167
DOI
24

Hicks SA, Strümke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022;12(1):5979. https://doi.org/10.1038/s41598-022-09954-8

25

Bogatinovski J, Todorovski L, Džeroski S, Kocev D. Comprehensive comparative study of multi‐label classification methods. Expert Syst. Appl. 2022;203. https://doi.org/10.1016/j.eswa.2022.117215

26
Schneider KM. Techniques for improving the performance of naive bayes for text classification. In: Gelbukh A editors. Computational linguistics and intelligent text processing. CICLing 2005. Lecture notes in computer science, vol 3406. Berlin, Heidelberg: Springer; 2005. https://doi.org/10.1007/978-3-540-30586-6_76
DOI
27
Sadman N, Tasneem S, Haque A, Islam MM, Ahsan MM, Gupta KD. Can NLP techniques be utilized as a reliable tool for medical science? Building a NLP framework to classify medical reports, 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2020, pp. 0159–0166. https://doi.org/10.1109/IEMCON51383.2020.9284834
DOI
28

Breiman L. Random forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324

29

Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic annotation of narrative radiology reports. Diagnostics. 2020;10(4):196. https://doi.org/10.3390/diagnostics10040196

30

Kokol P, Kokol M, Zagoranski S. Machine learning on small size samples: a synthetic knowledge synthesis. Sci Prog. 2022;105(1) https://doi.org/10.1177/00368504211029777

31
Schmidt Robin M. Recurrent neural networks (rnns): a gentle introduction and overview. arXiv preprint. 2019. arXiv: 1912.05911. https://doi.org/10.48550/arXiv.1912.05911
32

Ostmeyer J, Cowell L. Machine learning on sequential data using a recurrent weighted average. Neurocomputing. 2019;331:281–88. https://doi.org/10.1016/j.neucom.2018.11.066

33
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. 2017;5. http://arxiv.org/abs/1706.03762
34

Bressem KK, Adams LC, Gaudin RA, Tröltzsch D, Hamm B, Makowski MR, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre‐trained on 3.8 million text reports. Bioinformatics. 2021;36(21):5255–61. https://doi.org/10.1093/bioinformatics/btaa668

35
Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv. 2020;4. http://arxiv.org/abs/1910.01108
36

Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiol Artif intell. 2022;4(4):220007. https://doi.org/10.1148/ryai.220007

37
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv. 2019;1. http://arxiv.org/abs/1907.11692
38

Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain‐specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2022;3(1):1–23. https://doi.org/10.1145/3458754

39

Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, et al. RadBERT: adapting transformer‐based language models to radiology. Radiol Artif Intell. 2022;4(4):e210258. https://doi.org/10.1148/ryai.210258

40

Zhang Y, Liu M, Hu S, Shen Y, Lan J, Jiang B, et al. Development and multicenter validation of chest X‐ray radiography interpretations based on natural language processing. Commun Med. 2021;1:43. https://doi.org/10.1038/s43856-021-00043-x

Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 23 August 2022
Accepted: 23 February 2023
Published: 24 April 2023
Issue date: April 2023

Copyright

© 2023 The Authors.

Acknowledgements

ACKNOWLEDGEMENTS

The authors did not receive any external assistance in the drafting of this manuscript. The authors did not receive funding for this work.

Rights and permissions

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

Return