Discovery and disentanglement of aligned residue associations from aligned pattern clusters to reveal subgroup characteristics

从比对的模式簇中发现并解耦比对的残基关联,以揭示亚群特征。

阅读:1

Abstract

BACKGROUND: A protein family has similar and diverse functions locally conserved. An aligned pattern cluster (APC) can reflect the conserved functionality. Discovering aligned residue associations (ARAs) in APCs can reveal subtle inner working characteristics of conserved regions of protein families. However, ARAs corresponding to different functionalities/subgroups/classes could be entangled because of subtle multiple entwined factors. METHODS: To discover and disentangle patterns from mixed-mode datasets, such as APCs when the residues are replaced by their fundamental biochemical properties list, this paper presents a novel method, Extended Aligned Residual Association Discovery and Disentanglement (E-ARADD). E-ARADD discretizes the numerical dataset to transform the mixed-mode dataset into an event-value dataset, constructs an ARA Frequency Matrix and then converts it into an adjusted Statistical Residual (SR) Vector Space (SRV) capturing statistical deviation from randomness. By applying Principal Component (PC) Decomposition on SRV, PCs ranked by their variance are obtained. Finally, the disentangled ARAs are discovered when the projections on a PC is re-projected to a vector space with the same basis vectors of SRV. RESULTS: Experiments on synthetic, cytochrome c and class A scavenger data have shown that E-ARADD can a) disentangle the entwined ARAs in APCs (with residues or biochemical properties), b) reveal subtle AR clusters relating to classes, subtle subgroups or specific functionalities. CONCLUSIONS: E-ARADD can discover and disentangle ARs and ARAs entangled in functionality and location of protein families to reveal functional subgroups and subgroup characteristics of biological conserved regions. Experimental results on synthetic data provides the proof-of-concept validation on the successful disentanglement that reveals class-associated ARAs with or without class labels as input. Experiments on cytochrome c data proved the efficacy of E-ARADD in handing both types of residue data. Our novel methodology is not only able to discover and disentangle ARs and ARAs in specific statistical/functional (PCs and RSRVs) spaces, but also their locations in the protein family functional domains. The success of E-ARADD shows its great potential to proteomic research, drug discovery and precision and personalized genetic medicine.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。