Large language models in health care: Development, applications, and challenges

Rui Yang; Ting Fang Tan; Wei Lu; Arun James Thirunavukarasu; Daniel Shu Wei Ting; Nan Liu

doi:10.1002/hcs2.61

Health Care Science 2023, 2(4): 255-263 https://doi.org/10.1002/hcs2.61

Review |

Open Access | Issue | Published: 24 July 2023

Large language models in health care: Development, applications, and challenges

Show Author's Information Hide Author's Information Rui Yang^{¹^,}, Ting Fang Tan^{²^,}, Wei Lu^³, Arun James Thirunavukarasu^⁴, Daniel Shu Wei Ting^{²^,⁵^,}, Nan Liu^{⁵^,⁶^,}

(

)

Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Singapore National Eye Center, Singapore Eye Research Institute, Singapore Health Service, Singapore, Singapore

StatNLP Research Group, Singapore University of Technology and Design, Singapore

University of Cambridge School of Clinical Medicine, Cambridge, UK

Duke‐NUS Medical School, Centre for Quantitative Medicine, Singapore, Singapore

Duke‐NUS Medical School, Programme in Health Services and Systems Research, Singapore, Singapore

Rui Yang and Ting Fang Tan are joint‐first authors.

Daniel Shu Wei Ting and Nan Liu are Joint‐senior authors.

Keywords:

AI, Large language model, Health care

Cite this article:

Yang R, Tan TF, Lu W, et al. Large language models in health care: Development, applications, and challenges. Health Care Science, 2023, 2(4): 255-263. https://doi.org/10.1002/hcs2.61

Download citation

EndNote(RIS)

BibTeX

360

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

Recently, the emergence of ChatGPT, an artificial intelligence chatbot developed by OpenAI, has attracted significant attention due to its exceptional language comprehension and content generation capabilities, highlighting the immense potential of large language models (LLMs). LLMs have become a burgeoning hotspot across many fields, including health care. Within health care, LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre‐training. In the last 3 years, these domain‐specific LLMs have demonstrated exceptional performance on multiple natural language processing tasks, surpassing the performance of general LLMs as well. This not only emphasizes the significance of developing dedicated LLMs for the specific domains, but also raises expectations for their applications in health care. We believe that LLMs may be used widely in preconsultation, diagnosis, and management, with appropriate development and supervision. Additionally, LLMs hold tremendous promise in assisting with medical education, medical writing and other related applications. Likewise, health care systems must recognize and address the challenges posed by LLMs.

Full text

Abstract

Full text

Outline

About this article

Large language models in health care: Development, applications, and challenges

Show Author's information Hide Author's Information Rui Yang^{¹^,}, Ting Fang Tan^{²^,}, Wei Lu^³, Arun James Thirunavukarasu^⁴, Daniel Shu Wei Ting^{²^,⁵^,}, Nan Liu^{⁵^,⁶^,}

(

)

Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

Singapore National Eye Center, Singapore Eye Research Institute, Singapore Health Service, Singapore, Singapore

StatNLP Research Group, Singapore University of Technology and Design, Singapore

University of Cambridge School of Clinical Medicine, Cambridge, UK

Duke‐NUS Medical School, Centre for Quantitative Medicine, Singapore, Singapore

Duke‐NUS Medical School, Programme in Health Services and Systems Research, Singapore, Singapore

Rui Yang and Ting Fang Tan are joint‐first authors.

Daniel Shu Wei Ting and Nan Liu are Joint‐senior authors.

Abstract

Keywords: AI, Large language model, Health care

References(64)

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:1–11. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Google Scholar

Devlin J, Chang M‐W, Lee K, Toutanova K. BERT: pre‐training of deep bidirectional transformers for language understanding. 2018. https://doi.org/10.48550/arXiv.1810.04805

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few‐shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901. https://doi.org/10.48550/arXiv.2005.14165

DOI Google Scholar

Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. PaLM: scaling language modeling with pathways. arXiv:2204.02311. 2022. http://arxiv.org/abs/2204.02311

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M‐A, Lacroix T, et al. LLaMA: open and efficient foundation language models. 2023. http://arxiv.org/abs/2302.13971

OpenAI. GPT‐4 Technical Report. 2023. http://arxiv.org/abs/2303.08774

Amatriain X. Transformer models: an introduction and catalog. 2023. http://arxiv.org/abs/2302.07730

Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling laws for neural language models. 2020. http://arxiv.org/abs/2001.08361

Zhavoronkov A. Caution with AI‐generated content in biomedicine. Nature Med. 2023;29(3):532. https://doi.org/10.1038/d41591-023-00014-w

DOI Google Scholar

He Y, Zhu Z, Zhang Y, Chen Q, Caverlee J. Infusing disease knowledge into BERT for health question answering, medical inference and disease name recognition. 2020. https://doi.org/10.48550/arXiv.2010.03746

DOI

Li C, Zhang Y, Weng Y, Wang B, Li Z. Natural language processing applications for Computer‐Aided diagnosis in oncology. Diagnostics. 2023;13(2):286. https://doi.org/10.3390/diagnostics13020286

DOI Google Scholar

Omoregbe NAI, Ndaman IO, Misra S, Abayomi‐Alli OO, Damaševičius R. Text Messaging‐Based medical diagnosis using natural language processing and fuzzy logic. J Healthc Eng. 2020;2020(4):1–14. https://doi.org/10.1155/2020/8839524

DOI Google Scholar

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre‐trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40. https://doi.org/10.1093/bioinformatics/btz682

DOI Google Scholar

Alsentzer E, Murphy JR, Boag W, Weng W‐H, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings. 2019. https://doi.org/10.48550/arXiv.1904.03323

DOI

Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International Joint Conference on Natural Language Processing (EMNLP‐IJCNLP). 2019. https://doi.org/10.18653/v1/d19-1371

DOI

Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain‐Specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2022;3(1):1–23. https://doi.org/10.1145/3458754

DOI Google Scholar

Wang J, Zhang G, Wang W, Zhang K, Sheng Y. Cloud‐based intelligent self‐diagnosis and department recommendation service using Chinese medical BERT. Journal of Cloud Computing. 2021;10:1–12. https://doi.org/10.1186/s13677-020-00218-2

DOI Google Scholar

Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double‐edged swords. Radiology. 2023;307(2):230163. https://doi.org/10.1148/radiol.230163

DOI Google Scholar

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI‐assisted medical education using large language models. PLOS Digital Health. 2023;2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198

DOI Google Scholar

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. https://doi.org/10.2196/45312

DOI Google Scholar

Kitamura FC. ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology. 2023;307(2):230171. https://doi.org/10.1148/radiol.230171

DOI Google Scholar

Thirunavukarasu A, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M, et al. Trialling a large language model (ChatGPT) with Applied Knowledge Test questions: what are the opportunities and limitations of artificial intelligence chatbots in primary care? (Preprint). 2023. https://doi.org/10.2196/preprints.46599

DOI

Lei L, Liu D. A new medical academic word list: a corpus‐based study with enhanced methodology. J English Acad Purp. 2016;22:42–53. https://doi.org/10.1016/j.jeap.2016.01.008

DOI Google Scholar

Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC‐Ⅲ, a freely accessible critical care database. Sci Data. 2016;3:160035. https://doi.org/10.1038/sdata.2016.35

DOI Google Scholar

Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. npj digital Medicine. 2022;5(1):1–9. https://doi.org/10.1038/s41746-022-00742-2

DOI Google Scholar

Med‐PaLM. Med‐PaLM [Internet]. Available from: https://sites.research.google/med-palm/

Matias Y. Our latest health AI research updates. Google [Internet]. Available from: https://blog.google/technology/health/ai-llm-medpalm-research-thecheckup/

Li Y, Li Z, Zhang K, Dan R, Zhang Y. ChatDoctor: a medical chat model fine‐tuned on LLaMA model using medical domain knowledge. 2023. http://arxiv.org/abs/2303.14070

DOI

Xu C, Guo D, Duan N, McAuley J Baize: an open‐source chat model with parameter‐efficient tuning on self‐chat data. 2023. http://arxiv.org/abs/2304.01196

Ben Abacha A, Demner‐Fushman D. A question‐entailment approach to question answering. BMC Bioinformatics. 2019; 20(1):511. https://doi.org/10.1186/s12859-019-3119-4

DOI Google Scholar

World Health Organization. WHO global strategy on people‐centred and integrated health services: interim report. World Health Organization; 2015. https://apps.who.int/iris/handle/10665/155002]

Kenneth Leung on LinkedIn. Available from: https://www.linkedin.com/posts/kennethleungty_generativeai-ai-pharmacist-activity-7031533843429949440-pVZb

Bala S, Keniston A, Burden M. Patient perception of Plain‐Language medical notes generated using artificial intelligence software: pilot Mixed‐Methods study. JMIR Formative Research. 2020;4(6):e16670. https://doi.org/10.2196/16670

DOI Google Scholar

Van H, Kauchak D, Leroy G. AutoMeTS: the autocomplete for medical text simplification. 2020. https://doi.org/10.48550/arXiv.2010.10573

DOI

Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Canadian J Psychi. 2019;64(7):456–64. https://doi.org/10.1177/0706743719828977

DOI Google Scholar

Fitzpatrick KK, Darcy A, Vierhile M. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR Ment Health. 2017;4(2):e19. https://doi.org/10.2196/mental.7785

DOI Google Scholar

Denecke K, Vaaheesan S, Arulnathan A. A mental health chatbot for regulating emotions (SERMO)—concept and usability test. IEEE Transact Emerg Topics Comput. 2021;9:1170–82. https://doi.org/10.1109/tetc.2020.2974478

DOI Google Scholar

Singh S, Djalilian A, Ali MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023;38(5):503–7. https://doi.org/10.1080/08820538.2023.2209166

DOI Google Scholar

Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digital Health. 2023;5(3):e107–8. https://doi.org/10.1016/S2589-7500(23)00021-3

DOI Google Scholar

Insights CB. How artificial intelligence is reshaping medical billing & insurance. CB Insights Research [Internet]. Available from: https://www.cbinsights.com/research/artificial-intelligence-healthcare-providers-medical-billing-insurance/

Varanasi L. AI models like ChatGPT and GPT‐4 are acing everything from the bar exam to AP Biology. Here's a list of difficult exams both AI versions have passed. 2023. Website. https://www.businessinsider.com/list-here-are-the-exams-chatgpt-has-passed-so-far-2023-1

Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. 2022. https://doi.org/10.48550/arxiv.2212.13138

Burk‐Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE step 1 performance: implications of a student Self‐Directed parallel curriculum. Acad Med. 2017;92:S67–74. https://doi.org/10.1097/ACM.0000000000001916

DOI Google Scholar

Abou‐Hanna JJ, Owens ST, Kinnucan JA, Mian SI, Kolars JC. Resuscitating the socratic method: student and faculty perspectives on posing probing questions during clinical teaching. Acad Med. 2021;96(1):113–7. https://doi.org/10.1097/ACM.0000000000003580

DOI Google Scholar

Biswas S. ChatGPT and the future of medical writing. Radiology. 2023;307(2):223312. https://doi.org/10.1148/radiol.223312

DOI Google Scholar

BuildGreatProducts.club. The Potential of Large Language Models(LLMs) in Healthcare: Improving Quality of Care and Patient Outcomes. In: Medium [Internet]. Available from: https://medium.com/@BuildGP/the-potential-of-large-language-models-in-healthcare-improving-quality-of-care-and-patient-6e8b6262d5ca

Carlini N, Tramer F, Wallace E, Jagielski M, Herbert‐Voss A, Lee K, et al. Extracting training data from large language models. 2020. https://doi.org/10.48550/arXiv.2012.07805

Yang X, Lyu T, Li Q, Lee C‐Y, Bian J, Hogan WR, et al. A study of deep learning methods for de‐identification of clinical notes in cross‐institute settings. BMC Med Inform Decis Mak. 2019;19(Suppl 5):232. https://doi.org/10.1186/s12911-019-0935-4

DOI Google Scholar

Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339(6117):321–4. https://doi.org/10.1126/science.1229566

DOI Google Scholar

Na L, Yang C, Lo C‐C, Zhao F, Fukuoka Y, Aswani A. Feasibility of reidentifying individuals in large national physical activity data sets from which protected health information has been removed with use of machine learning. JAMA Network Open. 2018;1(8):e186040. https://doi.org/10.1001/jamanetworkopen.2018.6040

DOI Google Scholar

Erlich Y, Shor T, Pe'er I, Carmi S. Identity inference of genomic data using long‐range familial searches. Science. 2018;362(6415):690–4. https://doi.org/10.1126/science.aau4832

DOI Google Scholar

Du L, Xia C, Deng Z, Lu G, Xia S, Ma J. A machine learning based approach to identify protected health information in Chinese clinical text. Int J Med Inform. 2018;116:24–32. https://doi.org/10.1016/j.ijmedinf.2018.05.010

DOI Google Scholar

McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: still a ways to go. Sci Transl Med. 2021;13(586):eabb1655. https://doi.org/10.1126/scitranslmed.abb1655

DOI Google Scholar

OpenAI. ChatGPT: Optimizing Language Models for Dialogue. In: OpenAI [Internet]. Available from: https://openai.com/blog/chatgpt/

Volovici V, Syn NL, Ercole A, Zhao JJ, Liu N. Steps to avoid overuse and misuse of machine learning in clinical research. Nature Med. 2022;28(10):1996–9. https://doi.org/10.1038/s41591-022-01961-6

DOI Google Scholar

Norori N, Hu Q, Aellen FM, Faraci FD, Tzovara A. Addressing bias in big data and AI for health care: a call for open science. Patterns. 2021;2(10):100347. https://doi.org/10.1016/j.patter.2021.100347

DOI Google Scholar

Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE transactions on neural networks and learning systems. 2021;32(11):4793–813. https://doi.org/10.1109/TNNLS.2020.3027314

DOI Google Scholar

Creswell A, Shanahan M, Higgins I. Selection‐Inference: exploiting large language models for interpretable logical reasoning. 2022. http://arxiv.org/abs/2205.09712

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain‐of‐thought prompting elicits reasoning in large language models. 2022. http://arxiv.org/abs/2201.11903

Shi Y, Ma H, Zhong W, Mai G, Li X, Liu T, et al. ChatGraph: interpretable text classification by converting ChatGPT knowledge to graphs. 2023. http://arxiv.org/abs/2305.03513

Youssef A, Abramoff M, Char D. Is the algorithm good in a bad world, or has it learned to be bad? The ethical challenges of “locked” versus “continuously learning” and “autonomous” versus “assistive” AI tools in healthcare. Am J Bioeth. 2023;23(5):43–5. https://doi.org/10.1080/15265161.2023.2191052

DOI Google Scholar

Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digital Health. 2023;5(3):e105–6. https://doi.org/10.1016/S2589-7500(23)00019-5

DOI Google Scholar

Stanford CRFM. Alpaca: a strong, replicable instruction‐following model. Available from: https://crfm.stanford.edu/2023/03/13/alpaca.html

Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. https://doi.org/10.1007/s10916-023-01925-4

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 10 April 2023

Accepted: 12 June 2023

Published: 24 July 2023

Issue date: August 2023

Copyright

Acknowledgements

ACKNOWLEDGMENTS

Not applicable.

Rights and permissions

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.