Abstract: FR-PO0029
Semantic Search Empowered by Large Language Models for Rare Kidney Disease Literature: A Double-Wisdom Method
Session Information
- Artificial Intelligence and Digital Health at the Bedside
November 07, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- He, Guohua, Sun Yat-Sen University, Guangzhou, Guangdong, China
- Ding, Jie, Peking University First Hospital, Beijing, China
- Jiang, Xiaoyun, Sun Yat-Sen University, Guangzhou, Guangdong, China
Background
With the assistance of large language models (LLMs) and expert opinions, we aimed to integrate semantic search techniques to enhance retrieval accuracy and relevance in literature search for rare kidney diseases.
Methods
Using Alport syndrome and atypical hemolytic uremic syndrome (aHUS) as examples, we integrated systematic review strategies with Medical Subject Headings terms, optimized through iterative prompt engineering using LLMs (DeepSeek, ChatGPT-4, KIMI, ChatGLM, and Qwen). Clinicians further refined the outputs, culminating in the Double-Wisdom (D-W) method. The D-W method was compared with systematic review-based queries (Method A) and MeSH-based queries without LLM or expert input (Method B) using precision, recall, and F1 score metrics.
Results
Following PubMed searches using three methods, 3310 articles for Alport syndrome and 2388 articles for aHUS were identified. The D-W method outperformed both alternatives. For Alport syndrome, the D-W method achieved a recall rate of 100%, a precision rate of 95.79%, and an F1 Score of 97.85%. In comparison, Method A missed 59 articles (recall 98.26%) and Method B missed 51 (recall 98.49%) while achieving precision rates of 100% and 57.39%, respectively. For aHUS, the D-W method achieved perfect recall (100%) with a precision rate of 94.16% (F1 Score: 96.99%). Method A and Method B missed 194 and 188 articles, respectively, with F1 Scores of 92.93% and 93.07%.
Conclusion
The D-W method demonstrates the potential of LLMs to enhance semantic search for rare kidney disease literature, achieving a strong balance between precision and recall. By integrating LLM optimization with expert refinement, this approach addresses limitations of traditional methods and ensures comprehensive and accurate information retrieval. Its adaptability to other medical fields highlights its broader applicability in advancing research and clinical practice.