Abstract: SA-PO0020
Fine-Tuned Transformers Illuminate Kidney Single-Cell Heterogeneity
Session Information
- Intelligent Imaging and Omics: Phenotyping and Risk Stratification
November 08, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- Ziyadeh, Elias Mark, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Li, Chenyu, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Susztak, Katalin, University of Pennsylvania, Philadelphia, Pennsylvania, United States
Group or Team Name
- Susztak Lab.
Background
Kidney disease remains a global health burden, in part due to the kidney's complex cellular landscape and subtle transcriptional changes that challenge standard analysis methods. While single-cell RNA sequencing (scRNA-seq) enables high-resolution profiling, downstream tools often rely on distance metrics that may overlook injury states or rare cell subtypes. Transformer-based models like Geneformer and Universal Cell Embeddings (UCE) learn contextual gene expression patterns from large, diverse datasets, but kidney cells are underrepresented in these corpora. We hypothesized that fine-tuning on kidney-specific data could improve model accuracy and biological insight.
Methods
We compiled a high-quality kidney atlas of 720,924 scRNA-seq profiles from healthy and diseased human samples. After preprocessing and batch correction, we fine-tuned Geneformer for three tasks: (1) classifying healthy vs. injured tubular cells, (2) predicting transcription factor (TF) dose responses to fibrotic stimuli, and (3) simulating gene knockouts. UCE was adapted to create kidney-specific embeddings using masked-label training. Models were evaluated via F1 scores (classification), AUC (dose-response), fibrosis score shifts (perturbation), and silhouette scores (clustering).
Results
Fine-tuned Geneformer distinguished injured from healthy distal convoluted and connecting tubule cells with an F1 of 0.909, and thick ascending limb cells with 0.891. It predicted TF dose-sensitivity with an AUC of 0.86. Simulated knockout of fibrosis-linked genes (e.g., TMCO1) led to anti-fibrotic signature shifts aligned with prior data. Kidney-specific UCE embeddings produced improved cell type clusters and uncovered subpopulations not resolved by unfined models.
Conclusion
Kidney-specific fine-tuning significantly improves the performance of transformer-based models in resolving injury states, predicting responses, and simulating gene perturbations. This approach combines the generalizability of foundation models with the specificity of renal data, laying the groundwork for next-generation biomarker discovery and therapeutic development in nephrology.
Funding
- NIDDK Support