Probabilistic grammatical model for helix-helix contact site classification

用于螺旋-螺旋接触位点分类的概率语法模型

阅读:1

Abstract

BACKGROUND: Hidden Markov Models power many state-of-the-art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium- and long-range residue-residue interactions. This requires an expressive power of at least context-free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. RESULTS: In this work, we present a probabilistic grammatical framework for problem-specific protein languages and apply it to classification of transmembrane helix-helix pairs configurations. The core of the model consists of a probabilistic context-free grammar, automatically inferred by a genetic algorithm from only a generic set of expert-based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix-helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix-helix contact sites. CONCLUSIONS: We demonstrated that our probabilistic context-free framework for analysis of protein sequences outperforms the state of the art in the task of helix-helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human-readable. Thus they could provide biologically meaningful information for molecular biologists.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。