ASN's Mission

ASN leads the fight to prevent, treat, and cure kidney diseases throughout the world by educating health professionals and scientists, advancing research and innovation, communicating new knowledge, and advocating for the highest quality care for patients.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005


The Latest on Twitter

Kidney Week

Abstract: PO0759

Feature Selection and Machine Learning Model for Predicting Diabetic Kidney Disease Risk in Asians

Session Information

Category: Diabetic Kidney Disease

  • 602 Diabetic Kidney Disease: Clinical


  • Sabanayagam, Charumathi, Singapore Eye Research Institute, Singapore, Singapore
  • He, Feng, Singapore Eye Research Institute, Singapore, Singapore
  • Nusinovici, Simon, Singapore Eye Research Institute, Singapore, Singapore
  • Lim, Cynthia Ciwei, Singapore General Hospital, Singapore, Singapore
  • Li, Jialiang, National University of Singapore, Singapore, Singapore
  • Wong, Tien Yin, Singapore Eye Research Institute, Singapore, Singapore
  • Cheng, Ching-Yu, Singapore Eye Research Institute, Singapore, Singapore

Machine learning (ML) techniques may improve disease prediction and interpretability of regression models by identifying the most relevant features in multi-dimensional data. We evaluated the ability of various ML classifiers for feature identification and improving the prediction accuracy of diabetic kidney disease (DKD).


We utilized longitudinal data from 1364 Chinese, Malay and Indian participants aged 40-80 years with diabetes but free of DKD who attended the baseline visit of the Singapore epidemiology of Eye Diseases Study in 2004-2011 and were followed up for 6 years (2011-2017). Incident DKD (n=162) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73m2+25% decrease in eGFR at follow-up. We evaluated 339 features including demographic/clinical, retinal imaging, genetic and serum metabolomics profile and tested nine ML algorithms along with feature selection (gradient boosting decision tree, elastic net, random forest, support vector machine, neural network, LASSO etc.). The performance of the best ML model based on optimum features was compared to that of logistic regression (LR) with traditional risk factors using the area under the receiver operating characteristic curve (AUC), sensitivity and specificity.


The best performing model was a combination of Recursive feature elimination (RFE) for variable selection and Elastic Net (EN) using 15 predictors from demographic/clinical +metabolite set with AUC, sensitivity and specificity of 0.852, 83.0% and 73.5% compared to 0.796, 83.0% and 61.8% by LR. The top-15 predictors of DKD risk included seven risk factors and eight metabolites: age, antidiabetic medication use, presence of hypertension, diabetic retinopathy, higher levels of systolic blood pressure, HbA1c, lower levels of eGFR; higher levels of triglycerides in IDL, phospholipids in chylomicrons and medium VLDL, total cholesterol in chylomicrons and very small VLDL, medium LDL, cholesterol esters in very large HDL and lower levels of DHA, lactate and acetate.


ML together with feature selection improved prediction accuracy of DKD risk in the general population with diabetes and identified novel risk factors including metabolites.


  • Government Support – Non-U.S.