Abstract: TH-PO0638
Limitations of Phecodes for Kidney Phenotypes: Implications for Genomic and Electronic Health Records (EHR)-Based Research
Session Information
- Genetic Diseases of the Kidneys: Complex Kidney Traits
November 06, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Genetic Diseases of the Kidneys
- 1202 Genetic Diseases of the Kidneys: Complex Kidney Traits
Authors
- Nestor, Jordan Gabriela, Columbia University, New York, New York, United States
- Fang, Yilu, Columbia University, New York, New York, United States
- Zachariah, Teena, Columbia University, New York, New York, United States
- Prakash-Polet, Sindhuri, New York University, New York, New York, United States
- Han, Heedeok, Columbia University, New York, New York, United States
- Navarro Torres, Mariela, Columbia University, New York, New York, United States
- Weng, Chunhua, Columbia University, New York, New York, United States
Background
Phecodes are widely used to group ICD codes into phenotypes for high-throughput genomic research (e.g., GWAS, PheWAS). However, their clinical granularity varies, limiting their utility in precision nephrology, where disease classifications are closely tied to prognosis and treatment.
Methods
We manually curated 2,363 kidney-related SNOMED-CT concepts using the OMOP Common Data Model. EHR data from NewYork-Presbyterian/Columbia University Irving Medical Center through Q3 2023 were queried to identify which concepts were used in patient records (n = 261 concepts; n=1,127,717 patients). Phecode assignment was attempted using the UMLS Metathesaurus and VA phecode map. Three blinded nephrologists independently rated each mapping as a “good”, “acceptable”, or “poor/low granularity” match; final classifications were determined by majority vote.
Results
No direct phecode match existed for 106 of 261 concepts (40.6%), affecting 25.3% of patients (n=285,083). Among the 155 concepts with mapped phecodes, only 65 (41.9%) were rated as good matches; 60 (38.7%) were acceptable, and 30 (19.4%) were poor/low granularity.
Conclusion
Phecodes lack the granularity and traceability needed for accurate kidney phenotyping, collapsing distinct clinical features into broad categories and limiting their utility in advancing genomic research. When phenotype definitions lack clinical specificity, the validity of downstream genomic and multi-omics discoveries is fundamentally compromised. In contrast, the OMOP Common Data Model with SNOMED-CT and UMLS enables precise, interpretable, and interoperable phenotyping—essential for generating clinically meaningful insights from big data.
Summary of Phecode Coverage and Match Quality for Kidney Concepts
| Metric | Count | Percent (%) |
| Total Kidney-related Concepts Evaluated | 261 | 100% |
| -With Direct Phecode Match | 155 | 59.4% |
| Good Match | 65 | 41.9% (of mapped) |
| Acceptable Match | 60 | 38.7% (of mapped) |
| Poor Match/Low Granularity | 30 | 19.4% (of mapped) |
| -No Phecode Match | 106 | 40.6% |
| Patients Affected by Unmatched Concepts | 285,083 | 25.3% (of 1,127,717 total patients) |
Funding
- NIDDK Support