Abstract
Accurately predicting recombinant protein expression in Escherichia coli remains a long-standing challenge due to the multifactorial nature of gene regulation and translation. Existing computational approaches typically emphasize either codon usage or protein sequence features, limiting predictive accuracy and generalizability. Here we present TLCP-EPE, a transfer learning framework that, for the first time, fuses codon- and protein-level pre-trained language models to jointly capture determinants of expression. By fine-tuning CaLM and ProtT5 with low-rank adaptation (LoRA) and integrating their embeddings through a BiGRU-MLP predictor, TLCP-EPE learns expression-aware representations that outperform state-of-the-art methods. Across two independent test datasets, TLCP-EPE achieved robust performance (AUC 0.835 on codon data; AUC 0.713 on protein data), consistently surpassing conventional codon-based metrics and deep learning baselines. Our results demonstrate that dual-modal modeling of codon and protein sequences enables more accurate and generalizable prediction of expression levels, providing a powerful foundation for rational protein design and biomanufacturing applications.