Abstract: PUB048
Privacy-Preserving Synthetic Data Enhance Postoperative-AKI Prediction in Data-Scarce Scenarios
Session Information
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- Kwon, Soie, Chung-Ang University College of Medicine, Dongjak-gu, Seoul, Korea (the Republic of)
- Lee, Hajeong, Seoul National University Hospital, Jongno-gu, Seoul, Korea (the Republic of)
Background
Despite the growing use of artificial intelligence (AI), data availability and privacy concerns limit its clinical application. This study aimed to develop a synthetic model as a promising solution to address these, enabling the prediction of post-operative acute kidney injury (PO-AKI) prediction even with a relatively small real-world dataset.
Methods
We developed a synthetic model to generate virtual patient data, incorporating comorbidities, laboratory results, medication history, surgical details, and PO-AKI occurrence in patients underwent non-cardiac major surgeries. The model was built on the BERT architecture and trained using real-world data from data-rich hospitals. Privacy risks were evaluated through Membership and Attribute Inference Attacks (MIA and AIA). The similarity between synthetic and real-world data was statistically assessed, and its clinical utility was evaluated by examining whether augmenting data-scarce scenarios with exact matched synthetic data improved PO-AKI prediction using the CatBoost.
Results
335,687 real-world patient data were collected, including 275,727 from 3 data-rich and 59,960 from 3 data-scarce hospitals. The similarity between the real-world data from the data-rich hospitals and the synthetic data from each hospital was analyzed. At SNUH, 90.4% of variables showed no statistically significant difference between real-world and synthetic data, compared to 89.0% at SNUBH and 94.4% at AMC. The MIA and AIA analyses confirmed that the privacy protection was maintained. The clinical utility of synthetic data in PO-AKI prediction was evaluated by augmenting real-world data-scarce cohorts with synthetic data. The benefit was most pronounced in smaller cohorts, peaking at 2,000–4,000 synthetic patients and plateauing beyond 16,000 (Figure 1).
Conclusion
This is the first study to apply generative AI to PO-AKI prediction. We comprehensively demonstrate its clinical utility in data-scarce scenarios by enhancing prediction performance through synthetic data augmentation.