AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Original Research | Open Access

Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Jiale Xua,b,1Shujun Xiaa,b,1Qing Huaa,bZihan Meia,bYiqing Houa,bMinyan Weia,bLimei Laia,bYixuan Yanga,bJianqiao Zhoua,b( )
Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
College of Health Science and Technology, Shanghai Jiao Tong University School of Medicine, Shanghai, China

1 Jiale Xu and ShuJun Xia contributed equally to this study.

Show Author Information

Abstract

Objective

This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions.

Methods

The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT.

Results

GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004).

Conclusions

ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.

References

[1]
OpenAI. Introducing ChatGPT. https://openai.com/blog/chatgpt/. Nov 30, 2022. Accessed May 19, 2023.
[2]

Haver HL, Ambinder EB, Bahl M, Oluyemi ET, Jeudy J, Yi PH. Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology 2023;307:e230424.

[3]

Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence Chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol 2023;141:589-597.

[4]

Adams LC, Truhn D, Busch F, Karder A, Niehues SM, Makowski MR, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: A multilingual feasibility study. Radiology 2023;307:e230725.

[5]

Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology Board-style examination: Insights into current strengths and limitations. Radiology 2023;307:e230582.

[6]
OpenAI. GPT-4 Technical Report. https://ui.adsabs.harvard.edu/abs/2023arXiv230308774O May 8, 2023. Accessed May 10, 2023.
[7]

Bhayana R, Bleakney RR, Krishna S. GPT-4 in radiology: Improvements in advanced reasoning. Radiology 2023;307:e230987.

[8]
Ultrasound Registry Review Question Bank. https://www.prepry.com/ Accessed May 19, 2023.
[9]

Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology 2023;307:e230163.

Advanced Ultrasound in Diagnosis and Therapy
Pages 250-254
Cite this article:
Xu J, Xia S, Hua Q, et al. Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions. Advanced Ultrasound in Diagnosis and Therapy, 2024, 8(4): 250-254. https://doi.org/10.37015/AUDT.2024.240002

234

Views

15

Downloads

0

Crossref

0

Scopus

Altmetrics

Received: 19 January 2024
Revised: 13 March 2024
Accepted: 18 March 2024
Published: 30 December 2024
© AUDT 2024

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

Return