ColiFormer: A Transformer-Based Codon Optimization Model Balancing Multiple Objectives for Enhanced E. coli Gene Expression

ColiFormer:一种基于Transformer的密码子优化模型,平衡多个目标以增强大肠杆菌基因表达

阅读:2

Abstract

Codon optimization is widely used to improve heterologous gene expression in Escherichia coli. However, many existing methods focus primarily on maximizing the codon adaptation index (CAI) and neglect broader aspects of biological context. In this study, we present ColiFormer, a transformer-based codon optimization framework fine-tuned on 3676 high-expression E. coli genes curated from the NCBI database. Built on the CodonTransformer BigBird architecture, ColiFormer employs self-attention mechanisms and a mathematical optimization method (the augmented Lagrangian approach) to balance multiple biological objectives simultaneously, including CAI, GC content, tRNA adaptation index (tAI), RNA stability, and minimization of negative cis-regulatory elements. Based on in silico evaluations on 37,053 native E. coli genes and 80 recombinant protein targets commonly used in industrial studies, ColiFormer demonstrated significant improvements in CAI and tAI values, maintained GC content within biologically optimal ranges, and reduced inhibitory cis-regulatory motifs compared with established codon optimization approaches, while maintaining competitive runtime performance. These results represent computational predictions derived from standard in silico metrics; future experimental work is anticipated to validate these computational predictions in vivo. ColiFormer has been released as an open-source tool alongside the benchmark datasets used in this study.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。