ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005


The Latest on X

Kidney Week

Please note that you are viewing an archived section from 2022 and some content may be unavailable. To unlock all content for 2022, please visit the archives.

Abstract: SA-PO561

Heterogeneity in Electronic Health Record (HER) Phenotype Concepts in Collagen Type IV-Associated Nephropathies

Session Information

  • Genetic Diseases: Diagnosis
    November 05, 2022 | Location: Exhibit Hall, Orange County Convention Center‚ West Building
    Abstract Time: 10:00 AM - 12:00 PM

Category: Genetic Diseases of the Kidneys

  • 1102 Genetic Diseases of the Kidneys: Non-Cystic


  • Nestor, Jordan Gabriela, Columbia University, New York, New York, United States
  • Kiryluk, Krzysztof, Columbia University, New York, New York, United States
  • Weng, Chunhua, Columbia University, New York, New York, United States

Limited appreciation for the full spectrum of disease manifestations of collagen type IV-associated nephropathies (COL4A-AN) contributes to delays in diagnosis. Understanding the diversity of phenotypes is compounded by the heterogeneity of terms used to describe phenotype concepts in the EHR.


We extracted terms from published COL4A-AN case series and mapped them to concept unique identifiers (CUIs) in the Unified Medical Language System (UMLS). We identified 100 exome sequenced Columbia Biobank participants with diagnostic variant(s) in COL4A3/4/5 and performed a heuristic manual chart review. We counted the total number of unique concepts identified across structured (e.g., ICD9/10 and SNOMED-CT codes, etc.) and unstructured (e.g., clinical narratives, raw laboratory values, etc.) formats. Each encoded data element was mapped to standardized terminologies of the OMOP-Common Data Model. Then, we analyzed the diversity of codes used and conducted qualitative interviews with providers on billing practices.


Most of the rich descriptions were documented within the text of clinical narratives written by kidney experts. In addition, a review of the raw urinalysis data revealed temporal, diagnostic evidence of hematuria in nearly half the cohort. Across structured data formats, we found numerous billing codes used to document particular concepts, such as hematuria and hearing loss. Through qualitative interviews, we found that nephrologists selected codes that reflected the primary disease addressed in the visit and ones that demonstrated the medical complexity of the patient’s disease to maximize reimbursement.


EHR data heterogeneity is an obstacle to the development of accurate and valid phenotype algorithms for COL4A-AN and should be accounted for in EHR phenotyping. Extracting concepts from clinical text using natural language processing techniques, in addition to structured data elements like billing codes, may prove useful.

UMLS Concept Map for COL4A-AN


  • Other NIH Support