415
Views
28
Downloads
4
Crossref
N/A
WoS
0
Scopus
N/A
CSCD
Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.
Automated labelling of radiology reports using natural language processing allows for the labelling of ground truth for large datasets of radiological studies that are required for training of computer vision models. This paper explains the necessary data preprocessing steps, reviews the main methods for automated labelling and compares their performance. There are four main methods of automated labelling, namely: (1) rules‐based text‐matching algorithms, (2) conventional machine learning models, (3) neural network models and (4) Bidirectional Encoder Representations from Transformers (BERT) models. Rules‐based labellers perform a brute force search against manually curated keywords and are able to achieve high F1 scores. However, they require proper handling of negative words. Machine learning models require preprocessing that involves tokenization and vectorization of text into numerical vectors. Multilabel classification approaches are required in labelling radiology reports and conventional models can achieve good performance if they have large enough training sets. Deep learning models make use of connected neural networks, often a long short‐term memory network, and are similarly able to achieve good performance if trained on a large data set. BERT is a transformer‐based model that utilizes attention. Pretrained BERT models only require fine‐tuning with small data sets. In particular, domain‐specific BERT models can achieve superior performance compared with the other methods for automated labelling.
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, et al. Preparing medical imaging data for machine learning. Radiology. 2020;295(1): 4–15. https://doi.org/10.1148/radiol.2020192224
Mozayan A, Fabbri AR, Maneevese M, Tocino I, Chheang S. Practical guide to natural language processing for radiology. Radiographics. 2021;41(5): 1446–53. https://doi.org/10.1148/rg.2021200113
Yadav A, Patel A, Shah M. A comprehensive review on resolving ambiguities in natural language processing. AI Open. 2021;2: 85–92. https://doi.org/10.1016/j.aiopen.2021.05.001
Leaman R, Khare R, Lu Z. Challenges in clinical natural language processing for automated disorder normalization. J Biomed Inform. 2015;57: 28–37. https://doi.org/10.1016/j.jbi.2015.07.010
Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. Evaluation of negation phrases in narrative clinical reports. Proc AMIA Symp. 2001: 105–9.
Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural language processing for EHR‐based computational phenotyping. IEEE/ACM Trans Comput Biol Bioinf. 2019;16(1): 139–53. https://doi.org/10.1109/TCBB.2018.2849968
Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng C, et al. MIMIC‐CXR, a de‐identified publicly available database of chest radiographs with free‐text reports. Sci Data. 2019;6(1):317. https://doi.org/10.1038/s41597-019-0322-0
Demner‐Fushman D, Antani S, Simpson M, Thoma GR. Design and development of a multimodal biomedical information retrieval system. J Comput Sci Eng. 2012;6(2):168–77. https://doi.org/10.5626/JCSE.2012.6.2.168
Mandivarapu JK, Camp B, Estrada R. Deep active learning via open‐set recognition. Front Artif Intell. 2022;5:737363. https://doi.org/10.3389/frai.2022.737363
Banerjee I, Li K, Seneviratne M, Ferrari M, Seto T, Brooks JD, et al. Weakly supervised natural language processing for assessing patient‐centered outcome following prostate cancer treatment. JAMIA Open. 2019;2(1):150–9. https://doi.org/10.1093/jamiaopen/ooy057
Agnikula Kshatriya BS, Sagheb E, Wi CI, Yoon J, Seol HY, Juhn Y, et al. Identification of asthma control factor in clinical notes using a hybrid deep learning model. BMC Med Inform Decis Mak. 2021;21(Suppl 7):272. https://doi.org/10.1186/s12911-021-01633-4
Asudani DS, Nagwani NK, Singh P. Impact of word embedding models on text analytics in deep learning environment: a review. Artif Intell Rev. 2023:1–81. Epub 2023 February 22. https://doi.org/10.1007/s10462-023-10419-1
Chen TL, Emerling M, Chaudhari GR, Chillakuru YR, Seo Y, Vu TH, et al. Domain specific word embeddings for natural language processing in radiology. J Biomed Inf. 2021;113:103665. https://doi.org/10.1016/j.jbi.2020.103665
Ranum DL. Knowledge‐based understanding of radiology text. Comput Methods Programs Biomed. 1989;30(2–3):209–15. https://doi.org/10.1016/0169-2607(89)90073-4
Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea‐Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc AAAI Conf Artif Intell. 2019;33(01):590–7. https://doi.org/10.1609/aaai.v33i01.3301590
Hicks SA, Strümke I, Thambawita V, Hammou M, Riegler MA, Halvorsen P, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep. 2022;12(1):5979. https://doi.org/10.1038/s41598-022-09954-8
Bogatinovski J, Todorovski L, Džeroski S, Kocev D. Comprehensive comparative study of multi‐label classification methods. Expert Syst. Appl. 2022;203. https://doi.org/10.1016/j.eswa.2022.117215
Breiman L. Random forests. Mach Learn. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324
Krsnik I, Glavaš G, Krsnik M, Miletić D, Štajduhar I. Automatic annotation of narrative radiology reports. Diagnostics. 2020;10(4):196. https://doi.org/10.3390/diagnostics10040196
Kokol P, Kokol M, Zagoranski S. Machine learning on small size samples: a synthetic knowledge synthesis. Sci Prog. 2022;105(1) https://doi.org/10.1177/00368504211029777
Ostmeyer J, Cowell L. Machine learning on sequential data using a recurrent weighted average. Neurocomputing. 2019;331:281–88. https://doi.org/10.1016/j.neucom.2018.11.066
Bressem KK, Adams LC, Gaudin RA, Tröltzsch D, Hamm B, Makowski MR, et al. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre‐trained on 3.8 million text reports. Bioinformatics. 2021;36(21):5255–61. https://doi.org/10.1093/bioinformatics/btaa668
Tejani AS, Ng YS, Xi Y, Fielding JR, Browning TG, Rayan JC. Performance of multiple pretrained BERT models to automate and accelerate data annotation for large datasets. Radiol Artif intell. 2022;4(4):220007. https://doi.org/10.1148/ryai.220007
Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain‐specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2022;3(1):1–23. https://doi.org/10.1145/3458754
Yan A, McAuley J, Lu X, Du J, Chang EY, Gentili A, et al. RadBERT: adapting transformer‐based language models to radiology. Radiol Artif Intell. 2022;4(4):e210258. https://doi.org/10.1148/ryai.210258
Zhang Y, Liu M, Hu S, Shen Y, Lan J, Jiang B, et al. Development and multicenter validation of chest X‐ray radiography interpretations based on natural language processing. Commun Med. 2021;1:43. https://doi.org/10.1038/s43856-021-00043-x
The authors did not receive any external assistance in the drafting of this manuscript. The authors did not receive funding for this work.
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.