Abstract
Hormone receptor-positive/HER2-negative breast cancer (BC) is the most common subtype of BC and typically occurs as an early, operable disease. In patients receiving neoadjuvant chemotherapy (NACT), pathological complete response (pCR) is rare and multiple efforts have been made to predict disease recurrence. We developed a framework to predict pCR using clinicopathological characteristics widely available at diagnosis. The machine learning (ML) models were trained to predict pCR (n = 463), evaluated in an internal validation cohort (n = 109) and validated in an external validation cohort (n = 151). The best model was an Elastic Net, which achieved an area under the curve (AUC) of respectively 0.86 and 0.81. Our results highlight how simpler models using few input variables can be as valuable as more complex ML architectures. Our model is freely available and can be used to enhance the stratification of BC patients receiving NACT, providing a framework for the development of risk-adapted clinical trials.