Abstract
Bulk RNA-seq-based prediction of immune checkpoint blockade (ICB) responses has been extensively studied to distinguish responders from non-responders. However, cohort heterogeneity remains a major challenge, hindering the robustness and generalizability of predictive models across diverse RNA-seq datasets. In this study, we present IC2Bert, a novel model that employs masked gene expression pretraining combined with domain-specific supervised fine-tuning to enhance predictive robustness across heterogeneous ICB response cohorts. To ensure an objective evaluation, we assessed the model's performance using a Leave-One-Dataset-Out Cross-Validation (LODOCV) approach. IC2Bert demonstrated significantly improved predictive accuracy and robustness compared to existing methods, effectively addressing the challenges posed by cohort heterogeneity. The IC2Bert model and its source code are publicly available on GitHub: https://github.com/data2intelligence/ic2bert .