Bacteriocin prediction through cross-validation-based and hypergraph-based feature evaluation approaches

基于交叉验证和超图特征评估方法的细菌素预测

阅读:2

Abstract

Bacteriocins offer a promising solution to antibiotic resistance, possessing the ability to target a wide range of bacteria with precision. Thus, there is an urgent need for a computational model to predict new bacteriocins and aid in drug development. This work centers on constructing web-based predictive models using the XGBoost machine learning algorithm, based on the physicochemical properties, structural characteristics, and sequence profiles of protein sequences. We employed correlation analyses, cross-validation, and hypergraph-based techniques to select features. Cross-validated feature selection (CVFS) partitions the dataset, selects features within each partition, and identifies common features, ensuring representativeness. On the contrary, hypergraph-based feature evaluation (HFE) focuses on minimizing hypergraph cut conductance, leveraging higher-order data relationships to precisely utilize information regarding feature and sample correlations. The XGBoost models were built using the selected features obtained from these two feature evaluation methods. We also analyzed the feature contributions directly from the best model using SHapley Additive exPlanations (SHAP). Our HFE-based approach achieved 99.11% accuracy and an AUC of 0.9974 on the test data, overall outperforming the CVFS-based feature evaluation method and yielding results comparable to existing approaches. The most influential features are related to solvent accessibility for buried residues, followed by the composition of cysteine. Our web application, accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/, offers prediction results, probability scores, and SHAP plots using both cross-validation- and hypergraph-based methods, along with previously implemented approaches for feature selection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。