Abstract
Engineering and characterizing proteins can be time-consuming and cumbersome, motivating the development of generalist CRISPR-Cas enzymes(1-4) to enable diverse genome-editing applications. However, such enzymes have caveats such as an increased risk of off-target editing(3,5,6). Here, to enable scalable reprogramming of Cas9 enzymes, we combined high-throughput protein engineering with machine learning to derive bespoke editors that are more uniquely suited to specific targets. Through structure-function-informed saturation mutagenesis and bacterial selections, we obtained nearly 1,000 engineered SpCas9 enzymes and characterized their protospacer-adjacent motif (PAM)(7) requirements to train a neural network that relates amino acid sequence to PAM specificity. By utilizing the resulting PAM machine learning algorithm (PAMmla) to predict the PAMs of 64 million SpCas9 enzymes, we identified efficacious and specific enzymes that outperform evolution-based and engineered SpCas9 enzymes as nucleases and base editors in human cells while reducing off-targets. An in silico-directed evolution method enables user-directed Cas9 enzyme design, including for allele-selective targeting of the RHO(P23H) allele in human cells and mice. Together, PAMmla integrates machine learning and protein engineering to curate a catalogue of SpCas9 enzymes with distinct PAM requirements, motivating a shift away from generalist enzymes towards safe and efficient bespoke Cas9 variants.