Abstract
The human immune response relies on the unique ability of T-cell receptors (TCRs) to specifically bind to peptides, a process essential for immune surveillance and response. Although deep learning methods for prediction of TCR-peptide binding have proliferated, many encoder-based approaches learn dataset biases, greatly overestimating the model results, and ignoring the biochemical mechanisms and spatial properties affecting binding. Through our analysis, we found that interaction pairs generated by cross-mapping the amino acid properties between TCR and peptide implicitly simulate spatial structure, enabling machine learning models to capture information more effectively. Based on this insight, we developed T-cell receptor cross (TCRoss), a transformer-based model for large-scale learning. In addition, we observed that incorporating environmental information into the dataset not only mitigates learning biases but also improves performance. Experiments show that TCRoss consistently outperforms existing models in both observed contexts and de novo peptide scenarios. Wet-lab validation using T-cell activation assays confirmed the model's predictions for nonbinding peptides and provided critical experimental evidence for model assessment. Biophysical validation confirms that high-attention residue pairs correspond to crystallographically observed binding interfaces.