Abstract: TH-PO006
Exploring ChatGPT's Aptitude in Essential Concepts of Hypertension
Session Information
- AI, Digital Health, Data Science - I
November 02, 2023 | Location: Exhibit Hall, Pennsylvania Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Gonzalez Suarez, Maria Lourdes, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Schwartz, Gary L., Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Gregoire, James Robert, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Erickson, Stephen B., Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Thongprayoon, Charat, Mayo Clinic Minnesota, Rochester, Minnesota, United States
- Cheungpasitporn, Wisit, Mayo Clinic Minnesota, Rochester, Minnesota, United States
Background
ChatGPT is a state-of-the-art language model with human-like response generation capacity for various tasks. While there are debates about the possibility of ChatGPT replacing clinicians in clinical settings, its competence in nephrology, specifically in hypertension, remains uncertain. This study aims to assess ChatGPT's proficiency in addressing fundamental queries related to the diagnosis, treatment, and management of hypertension.
Methods
Using the Nephrology Self-Assessment Program (NephSAP) issues 2016-2022: V15N1, V17N1, V19N1, V21N4 from the American Society of Nephrology, we conducted a rigorous evaluation of ChatGPT's accuracy in answering questions related to hypertension. We excluded questions containing images due to ChatGPT's current limitations in image processing. The analysis included 95 questions from NephSAP. Each question set was executed 3 times using ChatGPT (version Mar 14, OpenAI), and we determined the level of agreement between the initial and subsequent attempts, conducted 2 weeks apart.
Results
Our analysis revealed that ChatGPT achieved accuracies of 65.5% on first attempt, and 76.4 and 78.1 % on second and on third attempts, respectively, for the NephSAP questions. We noted that ChatGPT had a higher level of correct answers compared to incorrect ones, and it improved its knowledge after every attempt (table 1).
Conclusion
Our findings indicate that ChatGPT's accuracy in addressing core concepts related to hypertension management falls below the minimum passing threshold of 75% established by the ASN for nephrologists, with an initial accuracy rate of 65.5%. This emphasizes the need for further development and training to improve ChatGPT's accuracy and consistency in the area of hypertension. Our study's outcomes have significant implications for ChatGPT's potential use as an educational tool for clinicians, highlighting the importance of ongoing research and development to broaden its proficiency in clinical subspecialties.
Accuracy of ChapGPT on Hypertension Questions
KSAP Issue | First Attempt, (%) | Second Attempt, (%) | Third Attempt, (%) |
V15N1 | 62 | 75.9 | 79.3 |
V17N1 | 83.3 | 93.3 | 93.3 |
V19N1 | 53.3 | 63.3 | 66.6 |
V21N4* | 63.3 | 73.3 | 73.3 |
Total accuracy | 65.48 | 76.45 | 78.12 |
* Questions 1-25