Decoding regulatory structures and features from epigenomics profiles: A Roadmap-ENCODE Variational Auto-Encoder (RE-VAE) model

从表观基因组学谱中解码调控结构和特征:路线图-ENCODE 变异自编码器 (RE-VAE) 模型

阅读:1

Abstract

The development of chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing (ChIP-seq) technologies has promoted generation of large-scale epigenomics data, providing us unprecedented opportunities to explore the landscape of epigenomic profiles at scales across both histone marks and tissue types. In addition to many tools directly for data analysis, advanced computational approaches, such as deep learning, have recently become promising to deeply mine the data structures and identify important regulators from complex functional genomics data. We implemented a neural network framework, a Variational Auto-Encoder (VAE) model, to explore the epigenomic data from the Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) project. Our model is applied to 935 reference samples, covering 28 tissues and 12 histone marks. We used the enhancer and promoter regions as the annotation features and ChIP-seq signal values in these regions as the feature values. Through a parameter sweep process, we identified the suitable hyperparameter values and built a VAE model to represent the epigenomics data and to further explore the biological regulation. The resultant Roadmap-ENCODE VAE (RE-VAE) model contained data compression and feature representation. Using the compressed data in the latent space, we found that the majority of histone marks were well clustered but not for tissues or cell types. Tissue or cell specificity was observed only in some histone marks (e.g., H3K4me3 and H3K27ac) and could be characterized when the number of tissue samples is large (e.g., blood and brain). In blood, the contributive regions and genes identified by RE-VAE model were confirmed by tissue-specificity enrichment analysis with an independent tissue expression panel. Finally, we demonstrated that RE-VAE model could detect cancer cell lines with similar epigenomics profiles. In conclusion, we introduced and implemented a VAE model to represent large-scale epigenomics data. The model could be used to explore classifications of histone modifications and tissue/cell specificity and to classify new data with unknown sources.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。