ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: TH-PO003

The Performance of ChatGPT in CKD: An Assessment Using NephSAP and KSAP

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science

Authors

  • Krisanapan, Pajaree, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Tangpanithandee, Supawit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Garcia Valencia, Oscar Alejandro, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Miao, Jing, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
  • Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background

ChatGPT is an AI-powered cutting-edge language model that has demonstrated outstanding capabilities in numerous natural language processing tasks, such as producing responses that closely resemble those generated by humans. While there has been growing discussion about ChatGPT's potential to serve as a replacement for physicians in clinical contexts, its proficiency in nephrology, specifically chronic kidney disease, remains unclear. The objective of this study is to evaluate ChatGPT's accuracy in answering essential questions related to chronic kidney disease, such as diagnosis, treatment, and management.

Methods

We evaluated ChatGPT's performance using the Nephrology Self-Assessment Program (NephSAP from 2011-2019) and Kidney Self-Assessment Program (KSAP from 2020-2023) of the ASN. Questions containing images were excluded. A total of 308 questions were included, with 205 from NephSAP and 103 from KSAP. Each question bank was executed twice using ChatGPT, and agreement between the initial and subsequent runs was determined.

Results

ChatGPT's performance in chronic kidney disease fell below the minimum passing threshold of 75% set by the ASN for nephrologists. On the NephSAP question banks, ChatGPT achieved accuracies of 53.2% and 55.6% on the first and second runs, respectively, with an overall agreement of 78.1%. On the KSAP question banks, ChatGPT's accuracy was 48.5% and 44.7% on the first and second runs, respectively, with an agreement of 66.0%. The overall agreement between the two runs was 74.0%. ChatGPT's level of agreement between initial and subsequent runs was higher for correct answers compared to incorrect ones.

Conclusion

Based on these results, it can be concluded that the current version of ChatGPT is not yet a fully reliable and useful medical education tool for clinical physicians, medical students, and nephrologists, and requires further development.