Protein engineering has recently seen tremendous transformation due to machine learning (ML) tools that predict structure from sequence at unprecedented precision. Predicting catalytic activity, however, remains challenging, restricting our capabilities to design protein sequences with desired catalytic function in silico. This predicament is mainly rooted in a lack of experimental methods capable of recording sequence-activity data in quantities sufficient for data-intensive ML techniques, and the inefficiency of searches in the enormous sequence spaces inherent to proteins. Herein, we address both limitations in the context of engineering proteases with tailored substrate specificity. We introduce a DNA recorder for deep specificity profiling of proteases in Escherichia coli as we demonstrate testing 29,716 candidate proteases against up to 134 substrates in parallel. The resulting sequence-activity data on approximately 600,000 protease-substrate pairs does not only reveal key sequence determinants governing protease specificity, but allows to build a data-efficient deep learning model that accurately predicts protease sequences with desired on- and off-target activities. Moreover, we present epistasis-aware training set design as a generalizable strategy to streamline searches within enormous sequence spaces, which strongly increases model accuracy at given experimental efforts and is thus likely to have implications for protein engineering far beyond proteases.
Data-driven protease engineering by DNA-recording and epistasis-aware machine learning.
基于DNA记录和上位性感知机器学习的数据驱动型蛋白酶工程
阅读:4
作者:Huber Lukas, Kucera Tim, Höllerer Simon, Borgwardt Karsten, Panke Sven, Jeschek Markus
| 期刊: | Nature Communications | 影响因子: | 15.700 |
| 时间: | 2025 | 起止号: | 2025 Jul 1; 16(1):5466 |
| doi: | 10.1038/s41467-025-60622-7 | 研究方向: | 免疫/内分泌 |
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
