Abstract
Conventional approaches to heterologous gene expression rely on codon optimization, which is limited to swapping synonymous codons and fails to capture deeper adaptive changes. In contrast, naturally evolved orthologous genes include non-synonymous mutations, insertions, and deletions that confer functional adaptation to different host contexts. Here we present OrthologTransformer, a Transformer-based deep learning model that converts orthologous genes between species by learning from large-scale orthologous gene datasets. The model recapitulates evolutionary differences-from synonymous codon swaps to amino acid-changing mutations and indels-to predict coding sequences optimized for target species while preserving protein function. In extensive tests across diverse bacterial species pairs, the model's context-aware gene designs more closely resembled native host orthologs, preserved protein functionality, and achieved superior expression yields compared to codon-optimized sequences. As proof of concept, an OrthologTransformer-redesigned PETase expressed in Bacillus subtilis showed robust activity, producing approximately 10-fold more reaction product than the codon-optimized enzyme, and achieving higher expression levels, thereby demonstrating improved enzyme performance via AI-guided gene design.
