Abstract
PURPOSE: Despite current standard-of-care endocrine therapy, distant recurrence remains a concern for patients with hormone receptor-positive (HR+)/HER2- early breast cancer (EBC). Understanding individual recurrence risk would aid in clinical decision-making. We used machine learning to identify risk factors and develop recurrence risk prediction models. EXPERIMENTAL DESIGN: Predictor variables were identified by gradient boosting and used to train models on a large, diverse real-world dataset of patients with stage I-III HR+/HER2- EBC obtained from the US-based, electronic health record-derived deidentified Flatiron Health Research Database. An elastic net-penalized Cox proportional hazards model was validated internally with real-world data and externally with data from the NATALEE trial of ribociclib in patients with HR+/HER2- EBC. Prediction and outcome concordance for distant recurrence and treatment effect were analyzed with Harrell's concordance index (C-index) and integrated Brier score; model performance over time was determined by dynamic AUC analysis. RESULTS: The model accurately predicted distant recurrence in the real-world cohort [n = 7,842; C-index: 0.85 (95% confidence interval, 0.8461-0.8598); integrated Brier score: 0.05 (95% confidence interval, 0.0443-0.0495)] over time (AUC >0.7 through 10 years); internal validation and sensitivity analyses confirmed model performance. External validation with the NATALEE nonsteroidal aromatase inhibitor alone arm yielded a lower but still discriminative performance (C-index: 0.66). Training on NATALEE data improved concordance (C-index: 0.70); the NATALEE-trained model predicted a 3.2% reduction in distant recurrence at 48 months with ribociclib treatment in the real-world cohort. CONCLUSIONS: A machine learning model was developed that accurately predicted distant recurrence in HR+/HER2- EBC. The identified predictor variables and developed models may aid in risk-based personalized treatment decision-making.