Abstract
BACKGROUND: A subset of breast cancer patients who achieved pathological complete response (pCR) after neoadjuvant therapy (NAT) still experience poor outcomes, including recurrence, metastasis, and death. This study aims to identify risk factors for adverse outcomes in pCR patients, construct predictive models, elucidate molecular subtype-specific prognostic determinants, and explore the peaks of death and progression events among different subtypes. METHODS: Female patients who received NAT and achieved pCR in the Surveillance, Epidemiology, and End Results (SEER) database were enrolled in this research. This study aims to clarify independent prognostic factors of overall survival (OS) and event-free survival (EFS) by using Cox regression analyses as well as developing nomograms and random survival forest (RSF) machine learning model to predict prognoses of patients with pCR. Subgroup analysis was performed to clarify molecular subtype heterogeneity, and survival sequential analysis was conducted to identify survival and progression event peaks. RESULTS: Analyses based on SEER data identified age, T stage, N stage, molecular subtype, histological tumor type, surgical approach, and histological grade as independent predictors of OS [Concordance index (C-index) =0.723; 3-year area under the curve (AUC) =0.707], while EFS predictors included age, T stage, N stage, molecular subtype, histological tumor type, and grade (C-index =0.682; 3-year AUC =0.690). The C-index of OS and EFS nomograms were 0.723 (3-year AUC =0.711) and 0.682 (3-year AUC =0.691) respectively. The RSF model for mortality risk achieved a C-index of 0.721 (3-year AUC =0.73). Prognostic factors varied across molecular subtypes, though T/N stage was a common determinant. Survival sequential peaks for death events occurred at 36 months [triple-negative breast cancer (TNBC)], 114 months (Luminal), and 97 months [human epidermal growth factor receptor 2 (HER2)-positive subtype], while progression events' peaks were observed at 111 months (TNBC), 114 months (Luminal), and 84 months (HER2-positive subtype). CONCLUSIONS: This study systematically revealed key clinicopathological factors influencing prognosis of pCR patients receiving NAT: tumor burden (T/N stage) emerged as a universal risk factor across molecular subtypes. Survival sequential analysis highlights subtype-specific surveillance priorities: intensified monitoring within 3 years for TNBC, focused follow-up at 7-8 years for HER2-positive subtype, and extended tracking for Luminal subtypes. Both nomograms and the RSF model demonstrated robust predictive performance, providing theoretical and practical tools for precision prognosis management in breast cancer.