Transfer learning with pre-trained language models for protein expression level prediction in Escherichia coli

利用预训练语言模型进行迁移学习,预测大肠杆菌中的蛋白质表达水平

阅读:2

Abstract

Accurately predicting recombinant protein expression in Escherichia coli remains a long-standing challenge due to the multifactorial nature of gene regulation and translation. Existing computational approaches typically emphasize either codon usage or protein sequence features, limiting predictive accuracy and generalizability. Here we present TLCP-EPE, a transfer learning framework that, for the first time, fuses codon- and protein-level pre-trained language models to jointly capture determinants of expression. By fine-tuning CaLM and ProtT5 with low-rank adaptation (LoRA) and integrating their embeddings through a BiGRU-MLP predictor, TLCP-EPE learns expression-aware representations that outperform state-of-the-art methods. Across two independent test datasets, TLCP-EPE achieved robust performance (AUC 0.835 on codon data; AUC 0.713 on protein data), consistently surpassing conventional codon-based metrics and deep learning baselines. Our results demonstrate that dual-modal modeling of codon and protein sequences enables more accurate and generalizable prediction of expression levels, providing a powerful foundation for rational protein design and biomanufacturing applications.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。