Abstract: SA-PO0003
Nephrobase: Multimodal Single-Cell Foundation Model for Decoding Kidney Biology
Session Information
- Intelligent Imaging and Omics: Phenotyping and Risk Stratification
November 08, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- Li, Chenyu, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
- Ziyadeh, Elias Mark, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
- Rao, Vishwanatha M, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
- Szegedy, Mario, Rutgers University New Brunswick, New Brunswick, New Jersey, United States
- Susztak, Katalin, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States
Background
Despite rapid advances in single-cell and spatial profiling, researchers remain unable to integrate diverse data types—transcriptomic, epigenomic and spatial—into a unified framework that captures the full spectrum of kidney cell states. Existing models often operate on individual assays or pooled cell populations, limiting their ability to resolve subtle phenotypes, track dynamic transitions or translate findings across species. A foundation model that ingests multimodal single-cell and single-nucleus data from human and animal kidneys would empower unbiased discovery of cell identities, regulatory networks and disease mechanisms at unprecedented resolution.
Methods
The Nephrobase architecture incorporates self-attention, cross-attention and a mixture-of-experts layer to handle sparse, high-dimensional inputs spanning 30 000 features per cell. Efficiency optimizations such as flash attention, cached key–value projections and sparse tensor routines enabled scalable pretraining on approximately 26.5 million single-cell and single-nucleus profiles. Task-specific fine-tuning refined the model for cell-type annotation, reconstruction of masked gene expression and prediction of genetic perturbation effects in both transcriptomic and spatial assays.
Results
Training across eight epochs on 26.5 million profiles with 60 billion token required only 7 days on a 8-GPU cluster. Nephrobase achieved 94 percent accuracy in human cell-type classification and maintained 92 percent accuracy in zero-shot transfer to mouse datasets. The model reconstructed 50 percent masked gene expression with an R square of 0.88 and delivered spatial transcriptomic imputation with an F1 score of 0.81, closely matching performance on scRNA-seq despite greater data sparsity. In perturbation modeling, predictions of APOL1 risk-allele effects correlated at r = 0.76 with independent knock-in data.
Conclusion
By unifying multiple single-cell and spatial modalities into a single, interpretable and efficient framework, Nephrobase establishes a new paradigm for precision nephrology. Its ability to resolve fine-grained cell states, impute missing molecular features and forecast the consequences of genetic alterations will accelerate early diagnostics, guide in silico therapeutic screening and streamline translation between preclinical models and human disease.