Protein structural class prediction based on an improved statistical strategy

基于改进统计策略的蛋白质结构分类预测

阅读:1

Abstract

BACKGROUND: A protein structural class (PSC) belongs to the most basic but important classification in protein structures. The prediction technique of protein structural class has been developing for decades. Two popular indices are the amino-acid-frequency (AAF) based, and amino-acid-arrangement (AAA) with long-term correlation (LTC) - based indices. They were proposed in many works. Both indices have its pros and cons. For example, the AAF index focuses on a statistical analysis, while the AAA-LTC emphasizes the long-term, biological significance. Unfortunately, the datasets used in previous work were not very reliable for a small number of sequences with a high-sequence similarity. RESULTS: By modifying a statistical strategy, we proposed a new index method that combines probability and information theory together with a long-term correlation. We also proposed a numerically and biologically reliable dataset included more than 5700 sequences with a low sequence similarity. The results showed that the proposed approach has its high accuracy. Comparing with amino acid composition (AAC) index using a distance method, the accuracy of our approach has a 16-20% improvement for re-substitution test and about 6-11% improvement for cross-validation test. The values were about 23% and 15% for the component coupled method (CCM). CONCLUSION: A new index method, combining probability and information theory together with a long-term correlation was proposed in this paper. The statistical method was improved significantly based on our new index. The cross validation test was conducted, and the result show the proposed method has a great improvement.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。