Uncovering the domain language of protein functionality and cell phenotypes using DANSy

利用 DANSy 揭示蛋白质功能和细胞表型的领域语言

阅读:1

Abstract

Evolution has developed a set of principles that determine feasible domain combinations, analogous to grammar within natural languages. Treating domains as words and proteins as sentences, made up of domain words, we apply a linguistic approach to represent the human proteome as an n-gram network, which we call hereafter as Domain Architecture Network Syntax (DANSy). Combining DANSy with network theory, we explore the abstract rules of domain word combinations within the human proteome and identify connections that determine feasible protein functionality. We analyze the entropic information content of these domain word connections to establish a DANSy network that balances recovering most of proteome with n-gram complexity. Additionally, we explored subnetwork languages by focusing on reversible post-translational modifications (PTMs) systems that follow a reader-writer-eraser paradigm. We find that PTM systems appear to sample grammar rules near the onset of the system expansion, but then converge towards similar grammar rules, which stabilize during the post-metazoan switch. For example, reader and writer domains are typically tightly connected through shared n-grams, but eraser domains are almost always loosely or completely disconnected from readers and writers. Additionally, after grammar fixation, domains with verb-like properties, such as writers and erasers, never appear together - consistent with the idea of natural grammar that leads to clarity and limits futile enzymatic cycles. Given how some cancer fusion genes represent the possibility for the emergence of novel language, we investigate how cancer fusion genes alter the human proteome n-gram network. We find most cancer fusion genes follow existing grammar rules. Finally, we adapt our DANSy analysis for differential expression (deDANSy) analysis to determine the relationship of coordinated changes in domain language syntax to cell phenotypes. We applied deDANSy to RNA-sequencing data from SOX10-deficient melanoma cells, finding that we can use network separation and syntax enrichment to characterize the molecular basis of cell phenotypes and identify novel information distinct from gene set enrichment analysis (GSEA) approaches. Collectively, these results suggest that n-gram based analysis of proteomes is a complement to direct protein interaction approaches, is more fully described than protein-protein interaction networks, and can be used to provide unique insights for signaling pathway enrichment analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。