ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Please note that you are viewing an archived section from 2019 and some content may be unavailable. To unlock all content for 2019, please visit the archives.

Abstract: TH-PO386

Optimizing Machine Learning Methods for Clinical Outcome Prediction

Session Information

Category: CKD (Non-Dialysis)

  • 2101 CKD (Non-Dialysis): Epidemiology, Risk Factors, and Prevention

Authors

  • Liu, Qian, Arbor Research Collaborative for Health, Ann Arbor, Michigan, United States
  • Smith, Abigail R., Arbor Research Collaborative for Health, Ann Arbor, Michigan, United States
  • Mariani, Laura H., University of Michigan, Ann Arbor, Michigan, United States
  • Zee, Jarcy, Arbor Research Collaborative for Health, Ann Arbor, Michigan, United States
Background

Machine learning (ML) is useful to identify novel biomarkers and predict clinical outcomes, especially when predictors outnumber patients, but model building procedures are underutilized. We compared two ML methods and the impact of pre-specifying covariate functional forms on predictive accuracy and variable importance using data from NEPTUNE, a prospective cohort study of glomerular disease patients.

Methods

The sample was split into training (70%) and validation (30%) sets. Ridge regression and random forest models were developed in the training set to predict time to two clinical outcomes: disease progression (ESRD or ≥40% eGFR decline with last eGFR <60) and complete remission of proteinuria (UPCR <0.3), with and without categorizing continuous covariates to accomodate non-linear associations with outcomes. Predictors included 56 demographic/clinical characteristics, which were ranked by variable importance. Discrimination was estimated in the validation set using integrated area under the curve (iAUC).

Results

Using pre-specified covariate functional forms in ridge regression increased iAUC from 0.68 to 0.74 for the progression outcome, but had little impact for remission (0.79 vs. 0.78; Fig) or the random forest method for both outcomes. iAUCs from random forest were higher than those from ridge for progression but not remission. After pre-specifying functional forms in ridge regression, variable importance ranks increased for some known risk factors: rank of UPCR for predicting remission rose from 48 to 5 and rank of eGFR for predicting progression rose from 52 to 1. Other important predictors were disease diagnosis, age, and immunosuppression use for remission and disease diagnosis, race, and hypertension for progression.

Conclusion

For ML methods assuming linear associations, like ridge regression, pre-specifying covariate functional forms is important for predictive accuracy and detecting important predictors. Different ML methods may improve prediction for different outcomes. Higher ranking of known risk factors improves face validity in prediction models and may have positive implications for external validation performance.

Funding

  • Other NIH Support