Discovery of a phenazine-thiol conjugase from sparse data using genome-informed machine learning

利用基因组信息机器学习从稀疏数据中发现吩嗪-硫醇结合酶

阅读:1

Abstract

Machine learning has enabled powerful biological discoveries using models trained on large datasets. However, for many important biological questions, such as identifying enzymes that transform understudied substrates, sparsity of training data is often a major bottleneck. Here, using phenazine natural products as a case study, we show that integrating genome-informed data augmentation with contrastive learning in protein language space enables identification of phenazine-interacting proteins starting from only 14 known phenazine modifying sequences. Applying this framework led to the discovery of PTC (Phenazine-Thiol Conjugase), the first enzyme known to catalyze phenazine thioconjugation, a phenazine modification reaction long observed but previously presumed to occur only through non-enzymatic chemistry. In silico simulation and experimental measurements demonstrate that PTC binds to both phenazine and glutathione as substrates. Recombinant expression and biochemical characterization reveal that PTC promotes glutathione-dependent modification of phenazines, yielding distinct reaction outcomes that depend on substrate identity. Although thiol-conjugated phenazine products exhibit reduced toxicity to bacterial cells, deletion of the gene encoding PTC does not confer a strong fitness disadvantage, illustrating how direct learning of sequences can uncover relevant enzymes that might evade phenotype-based genetic screens. Together, these results demonstrate that coupling comparative genomics with protein machine learning can convert "small data" typically outside the scope of machine learning into actionable predictive power, thereby facilitating enzyme discovery.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。