Abstract
Breast cancer is characterized by profound molecular heterogeneity, which severely limits the clinical utility of universal prognostic tools. To address this gap, we systematically explored transcriptomic profiles in three independent breast cancer cohorts (TCGA, METABRIC, and SCAN-B) via unsupervised clustering. We identified both pan-cancer and PAM50 subtype-specific consensus prognostic gene signatures through log-rank tests and cross-cohort intersection. Single-sample Gene Set Enrichment Analysis (ssGSEA)-derived prognostic scores strongly stratified overall survival across all cohorts, with superior performance over established features (assessed via C-index, time-dependent AUC, NRI, and IDI). Functional enrichment analysis uncovered subtype-specific biological mechanisms: immune-related pathways dominated good-prognostic gene sets in HER2-enriched and Basal-like tumors, while oncogenic pathways characterized poor-prognostic gene sets. Correlation analysis with CIBERSORT-deconvolved immune cell proportions revealed that good-prognostic scores positively correlated with anti-tumor immune cells (CD8+ T cells, M1 macrophages) and negatively with pro-tumor cells (M2 macrophages, Tregs). Independent validation in the Lancet2005 ER+ cohort confirmed that Luminal prognostic gene sets robustly stratified distant relapse-free survival. Collectively, these subtype-specific consensus signatures integrate tumor cell biology and tumor immune microenvironment features, offering robust prognostic tools with potential for future clinical translation.