scHiCSRS: a self-representation smoothing method with Gaussian mixture model for imputing single cell Hi-C data

scHiCSRS:一种基于高斯混合模型的自表征平滑方法,用于单细胞Hi-C数据插补

阅读:1

Abstract

BACKGROUND: Single cell Hi-C (scHi-C) techniques make it possible to study cell-to-cell variability, but excess of zeros are makes scHi-C matrices extremely sparse and difficult for downstream analyses. The observed zeros are a combination of two events: structural zeros for which two loci never interact due to underlying biological mechanisms, or dropouts (sampling zeros) where two loci interact but not captured due to insufficient sequencing depth. Although data quality improvement approaches have been proposed, little has been done to differentiate these two types of zeros, even though such a distinction can greatly benefit downstream analysis such as clustering. RESULTS: We propose scHiCSRS, a self-representation smoothing method that improves data quality, and a Gaussian mixture model that identifies structural zeros among observed zeros. scHiCSRS not only takes spatial dependencies of a scHi-C data matrix into account but also borrows information from similar single cells. Through an extensive set of simulation studies, we demonstrate the ability of scHiCSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analyses for three experimental datasets show that data improved from scHiCSRS yield more accurate clustering of cells than simply using observed data or improved data from comparison methods. CONCLUSION: In summary, scHiCSRS provides a valuable tool for identifying structural zeros and imputing dropouts. The resulted data are improved for downstream analysis, especially for understanding cell-to-cell variation through subtype clustering.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。