ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: TH-PO0979

Evaluating Racial Bias in Machine-Learning Models Predicting Five-Year Kidney Failure Risk: A Comparative Study with the Kidney Failure Risk Equation

Session Information

Category: Diversity and Equity in Kidney Health

  • 900 Diversity and Equity in Kidney Health

Authors

  • Hussain, Suhana, Mahatma Gandhi Medical College and Hospital, Jaipur, Rajasthan, India
  • McManus, Shawn, Liberty University College of Arts & Sciences, Lynchburg, Virginia, United States
  • Rafique, Sami, Rajasthan University of Health Sciences, Jaipur, Rajasthan, India
  • Kalia, Anchin, Mahatma Gandhi Medical College and Hospital, Jaipur, Rajasthan, India
  • Khan, Lubaba, Mahatma Gandhi Medical College and Hospital, Jaipur, Rajasthan, India
  • Padihar, Priyanka, Mahatma Gandhi Medical College and Hospital, Jaipur, Rajasthan, India
Background

Machine learning (ML) holds promise for predicting kidney failure in chronic kidney disease (CKD) patients, yet bias across racial groups remains underexplored. This study compares the performance and fairness of two ML models - Random Forest (RF) and XGBoost (XGB), against the validated 8-variable Kidney Failure Risk Equation (KFRE) across four racial groups.

Methods

Using NHANES data (2011-2020), we analyzed 1,684 CKD patients (222 Mexican American/Other Hispanic, 119 Asian/Other, 781 White, 562 Black). RF and XGB were trained using KFRE predictors, excluding race. Stratified 5-fold cross-validation and bootstrapping (1,000 samples/group) ensured balanced subgroup representation. Models were compared using performance and fairness metrics.

Results

Both models achieved high predictive accuracy (R2 > 0.93, AUC > 0.98). However, error metrics and calibration varied by race. Mexican American/Other Hispanics had the highest Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). For example, RF MAE peaked at 1.73 vs 0.64 in Whites. Brier score was also elevated. This pattern held in XGB, with reduced error but similar disparities.

Kolmogorov-Smirnov tests showed significant risk shifts, especially between Hispanic and White patients (RF D = 0.158, XGB D = 0.155, both p < 0.001). Cohen’s d confirmed large differences in MAE. SHAP analysis showed variation in feature importance. Platt scaling improved calibration, showing that post-hoc calibration can enhance reliability. These findings highlight disparities, raising concerns about the fairness and equity of ML models in racially diverse populations.

Conclusion

High performing ML models can show racial bias even without using race, causing over or under treatment in underrepresented groups and worsening CKD care disparities. Without audits and recalibration, these tools may delay care or misallocate resources. Our findings highlight the need for fairness evaluations to ensure AI in nephrology supports equitable care.

Digital Object Identifier (DOI)