Abstract
In this study, we analyzed large-scale T-cell receptor (TCR) sequence data to determine whether TCRs preferentially bind to major histocompatibility complex (MHC) class I (CD8+) or class II (CD4+) epitopes. Using the International ImMunoGeneTics information system numbering scheme, we identified specific positions with distinct amino acid enrichment for each MHC class and developed machine learning models for classification. While our frequency-based approach effectively differentiated MHC-I from MHC-II TCRs in cross-validation, performance declined when only beta chain data were used from real-world peripheral blood mononuclear cell samples. However, incorporating the TCR alpha chain significantly improved accuracy, emphasizing its importance for MHC recognition. Overall, we found that V-region loops can signal MHC class bias, aiding in immunotherapy design and TCR repertoire analysis, while highlighting the need for larger, more diverse datasets for reliable predictions.