ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005


The Latest on X

Kidney Week

Abstract: FR-PO031

Development and Evaluation of a Vision Transformer-Based Machine Learning Model for Improved Nephron Segmentation in Kidney Disease Analysis

Session Information

Category: Augmented Intelligence, Digital Health, and Data Science

  • 300 Augmented Intelligence, Digital Health, and Data Science


  • Li, Zhongwang, University College London, London, United Kingdom
  • Siew, Keith, University College London, London, United Kingdom
  • Walsh, Stephen B., University College London, London, United Kingdom
  • Walker-Samuel, Simon, University College London, London, United Kingdom

Group or Team Name

  • London Tubular Centre.

Impacting over 20 million globally, kidney diseases mostly stem from nephron lesions. Nephrons' complex tubular structure complicates 3D pathology assessment. We've created a workflow merging optical clearing and AI to automate and improve this process. Optical clearing transfer samples transparent for accurate nephron mapping. Light-sheet microscopy visualizes these structures, but precise segmentation is required for measurements like tubule volume. Our proposed vision transformer-based machine learning model automates these measurements, assisting diagnosis and offering gene expression insights. This adaptable method extends to other tubular tissues, promising wider benefits.


We collected 36 quarter-mouse-kidney light-sheet microscopy images, each 2048x2048 pixels, with a depth range of 700-1200 pixels. Images are splitted into blocks and individually segmented. Each block is treated as a 2D image series. Adjacent images' local features and segmentation results are encoded using a convolutional neural network. The transformer encoder/decoder processes high-level contextual representations, while the convolutional decoder recovers spatial dimensions and generates a tubule probability map. The final segmentation mask is derived from a normalized output probability map, and blocks are reassembled into an original-size image. Segmentation results can be visualized and validated using VR.


Amid ongoing data annotation, preliminary tests of our model on a synthetic dataset of 6000 images showed promising results. After ten epochs of training on 4800 images, it demonstrated an IoU score (Intersection over Union, a measure of the overlap between the predicted segmentation and the ground truth) of 93%, Binary Accuracy (percentage of correctly predicted data points out of all predictions) of 98%, and F1 Score (100% indicates a more robust prediction) of 96% on a validation set of 1200 images.


Despite synthetic data's simplicity, our model's promising performance implies its potential with real-world data. We'll keep refining the model and, once enough real-world data is annotated, we'll test and compare it with baseline models like the Classical 3D Convolutional Neural Network. At that point, we'll release the model's structure, code, annotated data, and performance results.