ASN's Mission

ASN leads the fight to prevent, treat, and cure kidney diseases throughout the world by educating health professionals and scientists, advancing research and innovation, communicating new knowledge, and advocating for the highest quality care for patients.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005


The Latest on Twitter

Kidney Week

Abstract: PO0762

Using Machine Learning to Predict CKD upon Type 2 Diabetes Mellitus Diagnosis

Session Information

Category: Diabetic Kidney Disease

  • 602 Diabetic Kidney Disease: Clinical


  • Allen, Angier O., Dascena Inc., Houston, Texas, United States
  • Iqbal, Zohora, Dascena Inc., Houston, Texas, United States
  • Green-Saxena, Abigail, Dascena Inc., Houston, Texas, United States
  • Das, Ritankar, Dascena Inc., Houston, Texas, United States

Chronic kidney disease (CKD) accounts for the majority of increased risk of mortality for diabetic patients, manifesting in approximately half of patients diagnosed with type 2 diabetes mellitus (T2DM). Although increased screening frequency can avoid missed diagnoses, this is not implemented uniformly. We developed and retrospectively validated a machine learning algorithm (MLA) to predict CKD within 5 years upon T2DM diagnosis.


Electronic health records (EHR) data of 171,201 recently diagnosed T2DM patients (age ≥ 18) was extracted from a proprietary database of >700 healthcare sites across the US between 2007-2020. A random forest MLA was developed to assess risk of Stage 3+ CKD (CKD 3+) in T2DM patients using EHR data collected in the year prior to T2DM diagnosis. International Classification of Diseases codes (ICD-9 and ICD-10) were used to identify T2DM and CKD 3+ patients. The MLA was tested on a hold-out test set of 42,801 patients as well as a separate external validation dataset. The Centers for Disease Control and Prevention (CDC) CKD risk score was used as a comparator. Performance of the MLA and CDC CKD risk score was assessed on the hold-out test set and the external validation dataset via area under the receiver operating characteristic curve (AUROC).


On a hold-out test set and an external validation dataset, the MLA outperformed the CDC CKD risk score when analyzed for prediction of CKD 3+ in recently diagnosed T2DM patients (Fig 1).


This retrospective study shows that a MLA can provide timely predictions of CKD among recently-diagnosed T2DM patients. Early detection of CKD in diabetic patients may enable therapeutic interventions, lifestyle changes, prevention of progression, and reduction of dialysis dependency, as well as healthcare costs.

Figure 1. Area under receiving operating characteristic curves for the machine learning algorithm (MLA) and CDC CKD risk model (CDC) for Stage 3+ diabetic CKD predictions performed on the hold-out test set and external validation dataset.