Effective Gene Expression Prediction and Optimization from Protein Sequences

基于蛋白质序列的有效基因表达预测与优化

阅读:1

Abstract

High soluble protein expression in heterologous hosts is crucial for various research and applications. Despite considerable research on the impact of codon usage on expression levels, the relationship between protein sequence and expression is often overlooked. In this study, a novel connection between protein expression and sequence is uncovered, leading to the development of SRAB (Strength of Relative Amino Acid Bias) based on AEI (Amino Acid Expression Index). The AEI served as an objective measure of this correlation, with higher AEI values enhancing soluble expression. Subsequently, the pre-trained protein model MP-TRANS (MindSpore Protein Transformer) is developed and fine-tuned using transfer learning techniques to create 88 prediction models (MPB-EXP) for predicting heterologous expression levels across 88 species. This approach achieved an average accuracy of 0.78, surpassing conventional machine learning methods. Additionally, a mutant generation model, MPB-MUT, is devised and utilized to enhance expression levels in specific hosts. Experimental validation demonstrated that the top 3 mutants of xylanase (previously not expressed in Escherichia coli) successfully achieved high-level soluble expression in E. coli. These findings highlight the efficacy of the developed model in predicting and optimizing gene expression based on protein sequences.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。