Protein-protein interface hot spots prediction based on a hybrid feature selection strategy

基于混合特征选择策略的蛋白质-蛋白质界面热点预测

阅读:1

Abstract

BACKGROUND: Hot spots are interface residues that contribute most binding affinity to protein-protein interaction. A compact and relevant feature subset is important for building machine learning methods to predict hot spots on protein-protein interfaces. Although different methods have been used to detect the relevant feature subset from a variety of features related to interface residues, it is still a challenge to detect the optimal feature subset for building the final model. RESULTS: In this study, three different feature selection methods were compared to propose a new hybrid feature selection strategy. This new strategy was proved to effectively reduce the feature space when we were building the prediction models for identifying hotspot residues. It was tested on eighty-two features, both conventional and newly proposed. According to the strategy, combining the feature subsets selected by decision tree and mRMR (maximum Relevance Minimum Redundancy) individually, we were able to build a model with 6 features by using a PSFS (Pseudo Sequential Forward Selection) process. Compared with other state-of-art methods for the independent test set, our model had shown better or comparable predictive performances (with F-measure 0.622 and recall 0.821). Analysis of the 6 features confirmed that our newly proposed feature CNSV_REL1 was important for our model. The analysis also showed that the complementarity between features should be considered as an important aspect when conducting the feature selection. CONCLUSION: In this study, most important of all, a new strategy for feature selection was proposed and proved to be effective in selecting the optimal feature subset for building prediction models, which can be used to predict hot spot residues on protein-protein interfaces. Moreover, two aspects, the generalization of the single feature and the complementarity between features, were proved to be of great importance and should be considered in feature selection methods. Finally, our newly proposed feature CNSV_REL1 had been proved an alternative and effective feature in predicting hot spots by our study. Our model is available for users through a webserver: http://zhulab.ahu.edu.cn/iPPHOT/ .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。