Abstract
Accurate prediction of MHC-peptide binding affinity remains a challenge for immunotherapeutic development. Existing methods struggle to jointly model functional semantics of polymorphic residues, evolutionary conservation constraints, and structural dynamic. We propose the Contrast learning-based Multi-feature Heterogeneous Subgraph model (CMHS) with sequence and structural representation. For sequence representation, we introduce LoRA fine-tuning to obtain the MHC-exclusive sequence representation from ESM2, then jointly BLOSUM50 to capture long-range functional dependencies and evolutionarily conserved residues. For structural representation, we use the biophysics-guided heterogeneous graph network. Constructing an MHC-peptide graph with a novel trainable Gaussian noise layer guided by crystallographic B-factors to dynamically simulate electron density uncertainty, coupled with a three-stage message-passing framework with subgraph aggregation, subgraph extraction and heterogeneous. Finally, to align sequence and graph representation spaces, we use contrastive learning to obtain a more comprehensive representation and to enhance the ability of model prediction. Evaluations on 16 HLA allele benchmarks show average SRCC improvements of 8.7%, with improvements of average AUC of 7.6%. This work establishes a new paradigm for predicting hypervariable immune interactions. The corresponding code can be founded in github.