Journal Home > Volume 2 , Issue 4

Recently, the emergence of ChatGPT, an artificial intelligence chatbot developed by OpenAI, has attracted significant attention due to its exceptional language comprehension and content generation capabilities, highlighting the immense potential of large language models (LLMs). LLMs have become a burgeoning hotspot across many fields, including health care. Within health care, LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre‐training. In the last 3 years, these domain‐specific LLMs have demonstrated exceptional performance on multiple natural language processing tasks, surpassing the performance of general LLMs as well. This not only emphasizes the significance of developing dedicated LLMs for the specific domains, but also raises expectations for their applications in health care. We believe that LLMs may be used widely in preconsultation, diagnosis, and management, with appropriate development and supervision. Additionally, LLMs hold tremendous promise in assisting with medical education, medical writing and other related applications. Likewise, health care systems must recognize and address the challenges posed by LLMs.


menu
Abstract
Full text
Outline
About this article

Large language models in health care: Development, applications, and challenges

Show Author's information Rui Yang1,Ting Fang Tan2,Wei Lu3Arun James Thirunavukarasu4 Daniel Shu Wei Ting2,5,Nan Liu5,6, ( )
Department of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Singapore National Eye Center, Singapore Eye Research Institute, Singapore Health Service, Singapore, Singapore
StatNLP Research Group, Singapore University of Technology and Design, Singapore
University of Cambridge School of Clinical Medicine, Cambridge, UK
Duke‐NUS Medical School, Centre for Quantitative Medicine, Singapore, Singapore
Duke‐NUS Medical School, Programme in Health Services and Systems Research, Singapore, Singapore

Rui Yang and Ting Fang Tan are joint‐first authors.

Daniel Shu Wei Ting and Nan Liu are Joint‐senior authors.

Abstract

Recently, the emergence of ChatGPT, an artificial intelligence chatbot developed by OpenAI, has attracted significant attention due to its exceptional language comprehension and content generation capabilities, highlighting the immense potential of large language models (LLMs). LLMs have become a burgeoning hotspot across many fields, including health care. Within health care, LLMs may be classified into LLMs for the biomedical domain and LLMs for the clinical domain based on the corpora used for pre‐training. In the last 3 years, these domain‐specific LLMs have demonstrated exceptional performance on multiple natural language processing tasks, surpassing the performance of general LLMs as well. This not only emphasizes the significance of developing dedicated LLMs for the specific domains, but also raises expectations for their applications in health care. We believe that LLMs may be used widely in preconsultation, diagnosis, and management, with appropriate development and supervision. Additionally, LLMs hold tremendous promise in assisting with medical education, medical writing and other related applications. Likewise, health care systems must recognize and address the challenges posed by LLMs.

Keywords: AI, Large language model, Health care

References(64)

1

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:1–11. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

2
Devlin J, Chang M‐W, Lee K, Toutanova K. BERT: pre‐training of deep bidirectional transformers for language understanding. 2018. https://doi.org/10.48550/arXiv.1810.04805
3

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few‐shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901. https://doi.org/10.48550/arXiv.2005.14165

4
Chowdhery A, Narang S, Devlin J, Bosma M, Mishra G, Roberts A, et al. PaLM: scaling language modeling with pathways. arXiv:2204.02311. 2022. http://arxiv.org/abs/2204.02311
5
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M‐A, Lacroix T, et al. LLaMA: open and efficient foundation language models. 2023. http://arxiv.org/abs/2302.13971
6
OpenAI. GPT‐4 Technical Report. 2023. http://arxiv.org/abs/2303.08774
7
Amatriain X. Transformer models: an introduction and catalog. 2023. http://arxiv.org/abs/2302.07730
8
Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling laws for neural language models. 2020. http://arxiv.org/abs/2001.08361
9

Zhavoronkov A. Caution with AI‐generated content in biomedicine. Nature Med. 2023;29(3):532. https://doi.org/10.1038/d41591-023-00014-w

10
He Y, Zhu Z, Zhang Y, Chen Q, Caverlee J. Infusing disease knowledge into BERT for health question answering, medical inference and disease name recognition. 2020. https://doi.org/10.48550/arXiv.2010.03746
DOI
11

Li C, Zhang Y, Weng Y, Wang B, Li Z. Natural language processing applications for Computer‐Aided diagnosis in oncology. Diagnostics. 2023;13(2):286. https://doi.org/10.3390/diagnostics13020286

12

Omoregbe NAI, Ndaman IO, Misra S, Abayomi‐Alli OO, Damaševičius R. Text Messaging‐Based medical diagnosis using natural language processing and fuzzy logic. J Healthc Eng. 2020;2020(4):1–14. https://doi.org/10.1155/2020/8839524

13

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre‐trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020;36(4):1234–40. https://doi.org/10.1093/bioinformatics/btz682

14
Alsentzer E, Murphy JR, Boag W, Weng W‐H, Jin D, Naumann T, et al. Publicly available clinical BERT embeddings. 2019. https://doi.org/10.48550/arXiv.1904.03323
DOI
15
Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th International Joint Conference on Natural Language Processing (EMNLP‐IJCNLP). 2019. https://doi.org/10.18653/v1/d19-1371
DOI
16

Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain‐Specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthcare. 2022;3(1):1–23. https://doi.org/10.1145/3458754

17

Wang J, Zhang G, Wang W, Zhang K, Sheng Y. Cloud‐based intelligent self‐diagnosis and department recommendation service using Chinese medical BERT. Journal of Cloud Computing. 2021;10:1–12. https://doi.org/10.1186/s13677-020-00218-2

18

Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double‐edged swords. Radiology. 2023;307(2):230163. https://doi.org/10.1148/radiol.230163

19

Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI‐assisted medical education using large language models. PLOS Digital Health. 2023;2(2):e0000198. https://doi.org/10.1371/journal.pdig.0000198

20

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. https://doi.org/10.2196/45312

21

Kitamura FC. ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology. 2023;307(2):230171. https://doi.org/10.1148/radiol.230171

22
Thirunavukarasu A, Hassan R, Mahmood S, Sanghera R, Barzangi K, El Mukashfi M, et al. Trialling a large language model (ChatGPT) with Applied Knowledge Test questions: what are the opportunities and limitations of artificial intelligence chatbots in primary care? (Preprint). 2023. https://doi.org/10.2196/preprints.46599
DOI
23

Lei L, Liu D. A new medical academic word list: a corpus‐based study with enhanced methodology. J English Acad Purp. 2016;22:42–53. https://doi.org/10.1016/j.jeap.2016.01.008

24

Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC‐Ⅲ, a freely accessible critical care database. Sci Data. 2016;3:160035. https://doi.org/10.1038/sdata.2016.35

25

Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, et al. A large language model for electronic health records. npj digital Medicine. 2022;5(1):1–9. https://doi.org/10.1038/s41746-022-00742-2

26
Med‐PaLM. Med‐PaLM [Internet]. Available from: https://sites.research.google/med-palm/
27
Matias Y. Our latest health AI research updates. Google [Internet]. Available from: https://blog.google/technology/health/ai-llm-medpalm-research-thecheckup/
28
Li Y, Li Z, Zhang K, Dan R, Zhang Y. ChatDoctor: a medical chat model fine‐tuned on LLaMA model using medical domain knowledge. 2023. http://arxiv.org/abs/2303.14070
DOI
29
Xu C, Guo D, Duan N, McAuley J Baize: an open‐source chat model with parameter‐efficient tuning on self‐chat data. 2023. http://arxiv.org/abs/2304.01196
30

Ben Abacha A, Demner‐Fushman D. A question‐entailment approach to question answering. BMC Bioinformatics. 2019; 20(1):511. https://doi.org/10.1186/s12859-019-3119-4

31
World Health Organization. WHO global strategy on people‐centred and integrated health services: interim report. World Health Organization; 2015. https://apps.who.int/iris/handle/10665/155002]
32
33

Bala S, Keniston A, Burden M. Patient perception of Plain‐Language medical notes generated using artificial intelligence software: pilot Mixed‐Methods study. JMIR Formative Research. 2020;4(6):e16670. https://doi.org/10.2196/16670

34
Van H, Kauchak D, Leroy G. AutoMeTS: the autocomplete for medical text simplification. 2020. https://doi.org/10.48550/arXiv.2010.10573
DOI
35

Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and conversational agents in mental health: a review of the psychiatric landscape. Canadian J Psychi. 2019;64(7):456–64. https://doi.org/10.1177/0706743719828977

36

Fitzpatrick KK, Darcy A, Vierhile M. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR Ment Health. 2017;4(2):e19. https://doi.org/10.2196/mental.7785

37

Denecke K, Vaaheesan S, Arulnathan A. A mental health chatbot for regulating emotions (SERMO)—concept and usability test. IEEE Transact Emerg Topics Comput. 2021;9:1170–82. https://doi.org/10.1109/tetc.2020.2974478

38

Singh S, Djalilian A, Ali MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023;38(5):503–7. https://doi.org/10.1080/08820538.2023.2209166

39

Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digital Health. 2023;5(3):e107–8. https://doi.org/10.1016/S2589-7500(23)00021-3

40
Insights CB. How artificial intelligence is reshaping medical billing & insurance. CB Insights Research [Internet]. Available from: https://www.cbinsights.com/research/artificial-intelligence-healthcare-providers-medical-billing-insurance/
41
Varanasi L. AI models like ChatGPT and GPT‐4 are acing everything from the bar exam to AP Biology. Here's a list of difficult exams both AI versions have passed. 2023. Website. https://www.businessinsider.com/list-here-are-the-exams-chatgpt-has-passed-so-far-2023-1
42
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. 2022. https://doi.org/10.48550/arxiv.2212.13138
43

Burk‐Rafel J, Santen SA, Purkiss J. Study behaviors and USMLE step 1 performance: implications of a student Self‐Directed parallel curriculum. Acad Med. 2017;92:S67–74. https://doi.org/10.1097/ACM.0000000000001916

44

Abou‐Hanna JJ, Owens ST, Kinnucan JA, Mian SI, Kolars JC. Resuscitating the socratic method: student and faculty perspectives on posing probing questions during clinical teaching. Acad Med. 2021;96(1):113–7. https://doi.org/10.1097/ACM.0000000000003580

45

Biswas S. ChatGPT and the future of medical writing. Radiology. 2023;307(2):223312. https://doi.org/10.1148/radiol.223312

46
BuildGreatProducts.club. The Potential of Large Language Models(LLMs) in Healthcare: Improving Quality of Care and Patient Outcomes. In: Medium [Internet]. Available from: https://medium.com/@BuildGP/the-potential-of-large-language-models-in-healthcare-improving-quality-of-care-and-patient-6e8b6262d5ca
47
Carlini N, Tramer F, Wallace E, Jagielski M, Herbert‐Voss A, Lee K, et al. Extracting training data from large language models. 2020. https://doi.org/10.48550/arXiv.2012.07805
48

Yang X, Lyu T, Li Q, Lee C‐Y, Bian J, Hogan WR, et al. A study of deep learning methods for de‐identification of clinical notes in cross‐institute settings. BMC Med Inform Decis Mak. 2019;19(Suppl 5):232. https://doi.org/10.1186/s12911-019-0935-4

49

Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. Identifying personal genomes by surname inference. Science. 2013;339(6117):321–4. https://doi.org/10.1126/science.1229566

50

Na L, Yang C, Lo C‐C, Zhao F, Fukuoka Y, Aswani A. Feasibility of reidentifying individuals in large national physical activity data sets from which protected health information has been removed with use of machine learning. JAMA Network Open. 2018;1(8):e186040. https://doi.org/10.1001/jamanetworkopen.2018.6040

51

Erlich Y, Shor T, Pe'er I, Carmi S. Identity inference of genomic data using long‐range familial searches. Science. 2018;362(6415):690–4. https://doi.org/10.1126/science.aau4832

52

Du L, Xia C, Deng Z, Lu G, Xia S, Ma J. A machine learning based approach to identify protected health information in Chinese clinical text. Int J Med Inform. 2018;116:24–32. https://doi.org/10.1016/j.ijmedinf.2018.05.010

53

McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: still a ways to go. Sci Transl Med. 2021;13(586):eabb1655. https://doi.org/10.1126/scitranslmed.abb1655

54
OpenAI. ChatGPT: Optimizing Language Models for Dialogue. In: OpenAI [Internet]. Available from: https://openai.com/blog/chatgpt/
55

Volovici V, Syn NL, Ercole A, Zhao JJ, Liu N. Steps to avoid overuse and misuse of machine learning in clinical research. Nature Med. 2022;28(10):1996–9. https://doi.org/10.1038/s41591-022-01961-6

56

Norori N, Hu Q, Aellen FM, Faraci FD, Tzovara A. Addressing bias in big data and AI for health care: a call for open science. Patterns. 2021;2(10):100347. https://doi.org/10.1016/j.patter.2021.100347

57

Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE transactions on neural networks and learning systems. 2021;32(11):4793–813. https://doi.org/10.1109/TNNLS.2020.3027314

58
Creswell A, Shanahan M, Higgins I. Selection‐Inference: exploiting large language models for interpretable logical reasoning. 2022. http://arxiv.org/abs/2205.09712
59
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain‐of‐thought prompting elicits reasoning in large language models. 2022. http://arxiv.org/abs/2201.11903
60
Shi Y, Ma H, Zhong W, Mai G, Li X, Liu T, et al. ChatGraph: interpretable text classification by converting ChatGPT knowledge to graphs. 2023. http://arxiv.org/abs/2305.03513
61

Youssef A, Abramoff M, Char D. Is the algorithm good in a bad world, or has it learned to be bad? The ethical challenges of “locked” versus “continuously learning” and “autonomous” versus “assistive” AI tools in healthcare. Am J Bioeth. 2023;23(5):43–5. https://doi.org/10.1080/15265161.2023.2191052

62

Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digital Health. 2023;5(3):e105–6. https://doi.org/10.1016/S2589-7500(23)00019-5

63
Stanford CRFM. Alpaca: a strong, replicable instruction‐following model. Available from: https://crfm.stanford.edu/2023/03/13/alpaca.html
64

Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. https://doi.org/10.1007/s10916-023-01925-4

Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 10 April 2023
Accepted: 12 June 2023
Published: 24 July 2023
Issue date: August 2023

Copyright

© 2023 The Authors. Tsinghua University Press.

Acknowledgements

ACKNOWLEDGMENTS

Not applicable.

Rights and permissions

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Return