Semantic search using protein large language models detects class II microcins in bacterial genomes

利用蛋白质大型语言模型进行语义搜索,可检测细菌基因组中的II类微生素。

阅读:1

Abstract

Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date, only 10 class II microcins have been described, and the discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST. We find that embeddings detect known class II microcins much more reliably than does BLAST and that any two microcins tend to have a small distance in embedding space even though they typically are highly diverged at the sequence level. In data sets of Escherichia coli, Klebsiella spp., and Enterobacter spp. genomes, we further find novel putative microcins that were previously missed by sequence-based search methods. IMPORTANCE: Antibiotic resistance is becoming an increasingly serious problem in modern medicine, but the development pipeline for conventional antibiotics is not promising. Therefore, alternative approaches to combat bacterial infections are urgently needed. One such approach may be to employ naturally occurring antibacterial peptides produced by bacteria to kill competing bacteria. A promising class of such peptides are class II microcins. However, only a small number of class II microcins have been discovered to date, and the discovery of further such microcins has been hampered by their high sequence divergence and short length, which can cause sequence-based search methods to fail. Here, we demonstrate that a more robust method for microcin discovery can be built on the basis of a protein large language model, and we use this method to identify several putative novel class II microcins.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。