Abstract: SA-OR057

A Machine Learning Approach to Predicting ESRD

Session Information

Category: Chronic Kidney Disease (Non-Dialysis)

  • 304 CKD: Epidemiology, Outcomes - Non-Cardiovascular

Authors

  • Nadkarni, Girish N., Icahn School of Medicine at Mount Sinai, New York, New York, United States
  • Lee, Edward, pulseData, Inc., New York, New York, United States
  • Fielding, Oliver Leonard, pulseData, Inc., New York, New York, United States
  • Cha, Teddy, pulseData, Inc., New York, New York, United States
  • Sun, Hai po, pulseData, Inc., New York, New York, United States
  • Kipers, Chris, pulseData, Inc., New York, New York, United States
  • Paiva, William D, Center for Health Systems Innovation (Oklahoma State University), Stillwater, Oklahoma, United States
  • Fong, Elvena, Center for Health Systems Innovation (Oklahoma State University), Stillwater, Oklahoma, United States
  • Coca, Steven G., Icahn School of Medicine at Mount Sinai, New York, New York, United States
Background

Risk prediction of end stage renal disease (ESRD) for population management and care intervention is both a research priority and unmet public health need. The use of electronic medical records (EMR) can be leveraged for improved assessment of ESRD onset. However, traditional risk scoring may not provide accurate risk prediction or complete population coverage if EMR data is incomplete. To handle missing data we developed a machine learning (ML) approach and compared it to traditional risk scoring in two EMR cohorts.

Methods

We utilized longitudinal data from the Mount Sinai Chronic Kidney Disease registry and a data set from the Center for Health Systems Innovation at Oklahoma State University provided by the Cerner Corporation. Using a random forest ML technique and imputation we can predict risk of ESRD (defined as administrative codes for dialysis or transplant). We then compared it to the Tangri 4-Variable kidney failure risk equation (KFRE) by comparing area under curve (AUC) measures and the percent of the population on which each metric can be calculated.

Results

We analyzed data from 318,292 patients. The median age was 65 years, 54% were female and 20% were African American. 60% of the cohort had at least one estimated glomerular filtration rate (eGFR) measurement before ESRD onset, however, only 6% had both an eGFR measurement and a urine albumin creatinine ratio (UACR) value before failure. The AUC of the 4-Variable KFRE was 0.89 (95% CI [0.88, 0.91]), while the ML approach had an AUC of 0.94 (95% CI [0.94, 0.95]). Importantly, the improvement in AUC was achieved while risk-scoring 10 times more of the population.

Conclusion

The ML approach outperformed traditional risk scoring such as the 4-Variable KFRE both in risk discrimination and in population coverage. Therefore, future efforts to risk stratify for population management and care intervention will benefit from utilizing ML approaches.

Table. Comparison of 4-Variable KFRE and ML approach
 4 Variable KFREMachine Learning Approach
Population coverage, n (%)19,880 (6.3)191,435 (60.1)
ESRD events, n(%)846 (4.3)12,870 (6.7)
eGFR, Mean (SD)67.6 [49.6, 88.4]65.3 [47.2, 86.9]
Follow Up in Years, Mean (SD)3 [1.3, 5.3]3.4 [1.4, 6.0]
AUC (95% Confidence Interval)0.89 (0.88-0.90)0.94 (0.93-0.95)
Number of features considered4>20,000