Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree

基于梯度提升决策树的氨基酸序列多尺度编码预测蛋白质相互作用

阅读:1

Abstract

Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interactions making full use of physicochemical characteristics of amino acids. A protein sequence is encoded at multi-scale by seven properties, including their qualitative and quantitative descriptions, of amino acids. Five kinds of protein descriptors, frequency, composition, transformation, distribution and auto covariance, are extracted from these encodings for representing each protein sequence. The new formed feature representation consisted of 347 dimensions is able to capture not only the compositional and positional information but also their statistical significance of amino acids in the sequence. Based on such a feature representation, the gradient boosting decision tree algorithm is introduced to predict protein interaction class. When the proposed method is tested with the PPI data of S.cerevisiae, it achieves a prediction accuracy of 95.28% at the Matthew's correlation coefficient of 90.68%. Compared with the state-of-the-art works on H.pylori and Human, the accuracies can be raised to 89.27% and 98.00% respectively. Extensive experiments are performed for a crossover protein-protein interactions network and the prediction accuracies are also very promising. Because of learning capabilities of the gradient boosting decision tree and the mutil-scale feature representation scheme, the proposed method might be a useful tool for future proteomics studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。