Abstract: SA-PO0011
Interpretable Vision Transformer-Based Artificial Intelligence Models Predict Glomerulonephritis Subtypes from Multistain Histopathology Whole-Slide Images
Session Information
- Intelligent Imaging and Omics: Phenotyping and Risk Stratification
November 08, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- Vremenko, Dmytro, Harvard Medical School Department of Biomedical Informatics, Boston, Massachusetts, United States
- Chang, David R., Harvard Medical School Department of Biomedical Informatics, Boston, Massachusetts, United States
- Kuo, Chin-Chi, China Medical University Hospital, Taichung, Taichung City, Taiwan
- Yu, Kun-Hsing, Harvard Medical School Department of Biomedical Informatics, Boston, Massachusetts, United States
Background
Accurate diagnosis of glomerulonephritis (GN) requires analyzing large, multi-stain whole slide images (WSIs) from kidney biopsies. The size and heterogeneity of this data pose challenges for current deep learning models, which are primarily optimized for cancer and H&E-stained images. To address this, we aim to develop a deep learning approach capable of integrating multiple stains to perform patient-level GN subtype classification.
Methods
We analyzed 5,140 WSIs from 290 GN patients at China Medical University Hospital, Taiwan, covering minimal change disease (MCD), focal segmental glomerulosclerosis (FSGS), membranous nephropathy (MN), and IgA nephropathy (IgAN), across hematoxylin and eosin (H&E), periodic acid–Schiff (PAS), Masson trichrome (TRI), and Jones silver (SIL) stains. Tissue tiles (224×224 px at 10×) were extracted and encoded using the Virchow2 vision transformer. An attention-based multiple instance learning model was trained on patient-level tile sets. Data was split by patient (70% training, 10% validation, 20% testing) with 5-fold cross-validation. We trained stain-specific and multi-stain models, selected top performers by one-vs-rest AUC, ensembled them, and visualized model attention regions for interpretability.
Results
Multi-stain models outperformed single-stain models on most pairwise AUC comparisons (MCD vs FSGS: 0.802; MCD vs MN: 0.730, average MCD vs others: 0.744) (Table 1). The SIL stain performed best for MCD vs IgAN (0.744 AUC). Multi-stain model achieved the highest average one-vs-rest AUC (0.624), improving further with ensembling (0.660). Attention maps showed reliance on glomeruli and tubules.
Conclusion
GN subtype classification improves with multi-stain input, highlighting the need for multi-stain models. Our study establishes the feasibility of deep learning for GN diagnosis from histopathology WSIs and highlights the importance of training foundation models tailored to benign kidney pathology, beyond cancer and H&E.
Table 1. AUROC of Trained Models
| MCD vs FSGS | MCD vs MN | MCD vs IgAN | Average MCD vs Other | Average One-vs-Rest AUC | |
| H&E | 0.713 | 0.666 | 0.741 | 0.706 | 0.600 |
| PAS | 0.684 | 0.634 | 0.732 | 0.683 | 0.587 |
| TRI | 0.623 | 0.516 | 0.593 | 0.577 | 0.486 |
| SIL | 0.714 | 0.714 | 0.744 | 0.724 | 0.598 |
| Multi-Stain | 0.802 | 0.730 | 0.698 | 0.744 | 0.624 |
Funding
- Other NIH Support