Abstract
In this work, we propose a new deep-learning model, MHCrank, to predict the probability that a peptide will be processed for presentation by MHC class I molecules. We find that the performance of our model is significantly higher than that of two previously published baseline methods: MHCflurry and netMHCpan. This improvement arises from utilizing both cleavage site-specific kernels and learned embeddings for amino acids. By visualizing site-specific amino acid enrichment patterns, we observe that MHCrank's top-ranked peptides exhibit enrichments at biologically relevant positions and are consistent with previous work. Furthermore, the cosine similarity matrix derived from MHCrank's learned embeddings for amino acids correlates highly with physiochemical properties that have been experimentally demonstrated to be instrumental in determining a peptide's favorability for processing. Altogether, the results reported in this work indicate that MHCrank demonstrates strong performance compared with existing methods and could have vast applicability in aiding drug and vaccine development.