Abstract: TH-PO003
The Performance of ChatGPT in CKD: An Assessment Using NephSAP and KSAP
Session Information
- AI, Digital Health, Data Science - I
November 02, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Krisanapan, Pajaree, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Tangpanithandee, Supawit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Garcia Valencia, Oscar Alejandro, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
ChatGPT is an AI-powered cutting-edge language model that has demonstrated outstanding capabilities in numerous natural language processing tasks, such as producing responses that closely resemble those generated by humans. While there has been growing discussion about ChatGPT's potential to serve as a replacement for physicians in clinical contexts, its proficiency in nephrology, specifically chronic kidney disease, remains unclear. The objective of this study is to evaluate ChatGPT's accuracy in answering essential questions related to chronic kidney disease, such as diagnosis, treatment, and management.
Methods
We evaluated ChatGPT's performance using the Nephrology Self-Assessment Program (NephSAP from 2011-2019) and Kidney Self-Assessment Program (KSAP from 2020-2023) of the ASN. Questions containing images were excluded. A total of 308 questions were included, with 205 from NephSAP and 103 from KSAP. Each question bank was executed twice using ChatGPT, and agreement between the initial and subsequent runs was determined.
Results
ChatGPT's performance in chronic kidney disease fell below the minimum passing threshold of 75% set by the ASN for nephrologists. On the NephSAP question banks, ChatGPT achieved accuracies of 53.2% and 55.6% on the first and second runs, respectively, with an overall agreement of 78.1%. On the KSAP question banks, ChatGPT's accuracy was 48.5% and 44.7% on the first and second runs, respectively, with an agreement of 66.0%. The overall agreement between the two runs was 74.0%. ChatGPT's level of agreement between initial and subsequent runs was higher for correct answers compared to incorrect ones.
Conclusion
Based on these results, it can be concluded that the current version of ChatGPT is not yet a fully reliable and useful medical education tool for clinical physicians, medical students, and nephrologists, and requires further development.