Abstract: FR-PO152
Zero-Controlled Statistical Model for Single Nucleus ATAC-Seq Data Analysis and Demultiplexing
Session Information
- AKI: Mechanisms - II
November 04, 2022 | Location: Exhibit Hall, Orange County Convention Center‚ West Building
Abstract Time: 10:00 AM - 12:00 PM
Category: Acute Kidney Injury
- 103 AKI: Mechanisms
Authors
- Miao, Zhen, University of Pennsylvania, Philadelphia, Pennsylvania, United States
- Kim, Junhyong, University of Pennsylvania, Philadelphia, Pennsylvania, United States
Background
Single nucleus ATAC sequencing (snATAC-seq) is a technique that detects open chromatins for each individual cell. While it is a key assay to augment single cell RNA-seq data, analysis methods for snATAC-seq are still in development. In particular, there is a lack of methods that explicitly incorporate probabilistic models.
Methods
Here, we developed a zero-controlled statistical model for snATAC-seq data that accounts for missing data and uneven sequencing coverage. Our model accounts for different sources of zero (biological vs non-biological zero); and the presence of excess zero in this highly sparse data. Our statistical model enables model-based differential feature identification, cell type classification/annotation, doublet detection, and batch effect correction.
Results
Our evaluations with both simulated data and real data show consistently better performance compared to existing methods that do not account for missing data. Applying our method to several snATAC-seq datasets from kidney samples showed high accuracy (over 0.9 ARI) for cell type label transfer tasks, while simultaneously detecting potential doublet cells. We applied our method to detect cell type-specific regulatory elements in each kidney cell types during injury repair process in the mouse system.
Conclusion
Here we present a statistical model for snATAC-seq analysis. Our evaluation suggests that accounting for missing rate disparity is important in snATAC-seq data analysis and we should adjust for different sources of zero present in the data to reduce false discovery.
Funding
- NIDDK Support