Abstract: FR-PO116
Revolutionizing AKI and Critical Care Nephrology Education: Evaluating ChatGPT's Accuracy on Core Questions
Session Information
- AKI: Outcomes, RRT
November 03, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Acute Kidney Injury
- 102 AKI: Clinical, Outcomes, and Trials
Authors
- Sheikh, M. Salman, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Kashani, Kianoush, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Qureshi, Fawad, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Domecq Garces, Juan Pablo, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Craici, Iasmina, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
ChatGPT is a state-of-the-art language model with exceptional proficiency in various natural language processing tasks, including generating responses that closely mimic human-generated ones. While there is growing speculation about ChatGPT's potential to serve as a substitute for physicians in clinical settings, its proficiency in nephrology, including acute kidney injury and critical care nephrology, remains uncertain. This study aims to evaluate the performance of ChatGPT in answering core questions related to acute kidney injury and critical care nephrology.
Methods
The accuracy of ChatGPT was evaluated in answering questions related to acute kidney injury and critical care nephrology using the Nephrology Self-Assessment Program (NephSAP) and Kidney Self-Assessment Program of the American Society of Nephrology (KSAP). Questions containing images were excluded from the assessment due to current limitations in ChatGPT's image processing capabilities. One hundred ten questions were included in the evaluation, 45 from NephSAP and 55 from KSAP. Each question bank was executed twice using ChatGPT. The level of concordance between the initial and subsequent runs, which were conducted two weeks apart, was also examined.
Results
In the case of NephSAP questions, ChatGPT achieved accuracies of 55% and 69% on the initial and subsequent runs, respectively. For KSAP questions, it achieved accuracies of 46% and 40%, respectively. ChatGPT's accuracy on all 110 questions combined was 52% and 51% for the initial and subsequent runs. The overall concordance between the initial and subsequent runs was 78%, with 86 questions (78%) receiving the same response and 24 (22%) receiving different responses. Correct concordance was 57%, and incorrect concordance was 43%. Among the 24 questions with divergent responses, ChatGPT rectified 11 incorrect responses to become correct. Conversely, it changed its response from correct to incorrect in 5 out of 24 questions.
Conclusion
Our study shows that ChatGPT only responded correctly to half of the questions related to acute kidney injury and critical care nephrology with low reliability. Therefore, ChatGPT as an educational tool may not be precise or reliable, and further development may be necessary to improve its performance.