Abstract: FR-PO189
A Machine Learning Approach to Identifying Patients at Risk of Developing Incident CKD
Session Information
- CKD: Epidemiology, Risk Factors, Prevention - II
October 26, 2018 | Location: Exhibit Hall, San Diego Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: CKD (Non-Dialysis)
- 1901 CKD (Non-Dialysis): Epidemiology, Risk Factors, and Prevention
Authors
- Yu, Tia Yue, pulseData, New York, New York, United States
- Wiener, Lauren Alexandra, pulseData, New York, New York, United States
- Wang, Xiaoyan, pulseData, New York, New York, United States
- Fielding, Ollie, pulseData, New York, New York, United States
- Son, Jung Hoon, pulseData, New York, New York, United States
- Potukuchi, Praveen Kumar, University of Tennessee Health Science Center, Memphis, Tennessee, United States
- Kovesdy, Csaba P., University of Tennessee Health Science Center, Memphis, Tennessee, United States
Background
Chronic Kidney Disease (CKD) is an under-identified condition and current methodology for identifying patients at risk of developing incident CKD is limited. Identifying patients who are high risk for CKD can improve awareness while delaying onset and progression of CKD. Machine learning algorithms can be used to stratify risk of those likely to develop incident CKD. Previous work has defined CKD using ICD codes or a limited number of eGFR readings.
Methods
Data from 1,780,262 patients with no baseline CKD in the Veterans Affairs healthcare system was analyzed. We used a random forest classifier to 1) predict incident CKD (eGFR >90 progressing to eGFR <60) and 2) predict the development of advanced CKD (eGFR >60 progressing to eGFR <45) utilizing information on patient demographics, comorbidities, laboratory values, and medication use. We excluded eGFR values during an AKI episode using an algorithm and selected sustained eGFR periods using slope-based analyses. One, two, and five-year prediction models were generated.
Results
The performance of the prediction models are summarized in Table 1. As models predict on outcomes across longer time ranges the lab values become less important while the comorbidities rise in importance. At the top risk quartile, our one year incident CKD model has an AUC of 0.839, a sensitivity of 0.754, and a specificity of 0.751, and our one year development of advanced CKD model has an AUC of 0.871, a sensitivity of 0.825, and a specificity of 0.751.
Conclusion
We demonstrate the ability to leverage advanced machine learning models to predict CKD incidence using longitudinal data commonly available in EHR systems. Future studies should validate our model in a clinical setting.
Table 1
Metric | One Year | Two Year | Five Year | |
eGFR of > 90 predicting decline to eGFR of < 60 (incident CKD stage 3a and above) | AUC | 0.839 | 0.808 | 0.791 |
Outcome | 0.282% | 0.572% | 2.034% | |
Sensitivity (top quartile) | 0.754 | 0.723 | 0.651 | |
Specificity (top quartile) | 0.751 | 0.753 | 0.758 | |
eGFR of > 60 predicting decline to eGFR of < 45 (incident CKD stage 3b and above) | AUC | 0.871 | 0.853 | 0.830 |
Outcome | 0.191% | 0.474% | 2.018% | |
Sensitivity (top quartile) | 0.825 | 0.784 | 0.739 | |
Specificity (top quartile) | 0.751 | 0.753 | 0.760 |
Funding
- Veterans Affairs Support –