Accurate prediction of virulence factors using pre-train protein language model and ensemble learning

利用预训练蛋白质语言模型和集成学习准确预测毒力因子

阅读:1

Abstract

BACKGROUND: As bacterial pathogens develop increasing resistance to antibiotics, strategies targeting virulence factors (VFs) have emerged as a promising and effective approach for treating bacterial infections. Existing methods mainly relied on sequence similarity, and remote homology relationships cannot be discovered by sequence analysis alone. RESULTS: To address this limitation, we developed a protein language model and ensemble learning approach for VF identification (PLMVF). Specifically, we extracted features from protein sequences using ESM-2 and their three-dimensional (3D) structures using ESMFold. We calculated the true TM-score of the proteins based on their 3D structures and trained a TM-predictor model to predict structural similarity, thereby capturing hidden remote homology information within the sequences. Subsequently, we concatenated the sequence-level features extracted by ESM-2 with the predicted TM-score features to form a comprehensive feature set for prediction. Extensive experimental validation demonstrated that PLMVF achieved an accuracy (ACC) of 86.1%, significantly outperforming existing models across multiple evaluation metrics. This study provided an ideal tool for identifying novel targets in the development of anti-virulence therapies, offering promise for the effective prevention and control of pathogenic bacterial infections. CONCLUSIONS: The proposed PLMVF model offers an efficient computational approach for VF identification.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。