Abstract
Predicting T-cell receptor (TCR) and peptide-major histocompatibility complex (pMHC) interactions is critical for advancing targeted immunotherapies and personalized medicine. However, existing models often struggle with limited labeled data and poor generalization to novel epitopes. We present LANTERN (Large lAnguage model-powered TCR-Enhanced Recognition Network), a novel deep learning framework that combines pretrained protein and molecular language models with a cross-modality fusion mechanism. Specifically, LANTERN encodes TCR sequences using ESM and peptides as Simplified Molecular Input Line Entry System (SMILES) strings via MolFormer, capturing both evolutionary and chemical properties. A Multi-Head Cross-Attention (MHCA) module is introduced to align TCR and peptide representations, enabling the model to focus on interaction-relevant features across domains. This architecture improves generalization in zero-shot and few-shot scenarios. Extensive experiments on the TCHard benchmark demonstrate that LANTERN achieves competitive and robust performance compared with existing baselines, particularly under challenging random control and unseen epitope settings. These results highlight LANTERN's potential for robust TCR-pMHC binding prediction and downstream applications in personalized immunotherapy and vaccine development. For reproducing, our code is available at: https://anonymous.4open.science/r/LANTERN-87D9.