Abstract
Codon sequence design is crucial for generating mRNA sequences with desired functional properties for tasks such as developing mRNA vaccines or gene editing therapies. Yet existing methods lack flexibility and controllability to adapt to various design objectives. We propose a novel machine learning-based framework, ARCADE, that enables flexible and controllable multi-objective codon design. Leveraging inherent knowledge from pretrained genomic language models, ARCADE extends activation engineering, a technique originally developed for controllable text generation, beyond discrete feature manipulation such as concepts and styles, to steering continuous-valued biological metrics. Specifically, we derive biologically meaningful semantic steering vectors in the model's activation space, which directly control properties such as the Codon Adaptation Index, Minimum Free Energy, and GC content. Experimental results demonstrate the flexibility of ARCADE in designing codon sequences with multiple objectives, underscoring its potential for advancing programmable biological sequence design. Our implementation is available at https://github.com/Kingsford-Group/arcade.