Abstract
BACKGROUND: Cell-free DNA (cfDNA) fragmentomics represents a transformative approach for early breast cancer detection, offering significant potential to improve patient survival through timely intervention. Despite this promise, existing cfDNA-based methods demonstrate inadequate sensitivity for clinical implementation, particularly in early-stage malignancies. There remains an urgent need to develop robust, cost-effective diagnostic strategies integrating cfDNA fragmentomic profiling with advanced machine learning algorithms. METHODS: This research involved a total of 191 participants who did not have cancer and 204 participants diagnosed with breast cancer. The plasma cfDNA samples from the participants underwent profiling through whole-genome sequencing. A variety of cfDNA characteristics and machine learning models were assessed within the training cohort to attain the best model. The evaluation of model performance took place in a separate validation cohort. RESULTS: An assembled ensemble model that combines three cfDNA characteristics with six machine learning algorithms, developed in the training cohort (cancer: 119; healthy: 112), outperformed all models created from individual feature-algorithm pairs. This composite model demonstrated enhanced sensitivities of 93.3% at a specificity of 94.6% for the training cohort (area under the curve [AUC], 0.983) and 96.5% at 93.7% specificity for the validation cohort (AUC, 0.989) (cancer: 85; healthy: 79). Additionally, our model exhibited sensitivity across various stages, distinct pathological types, and diverse molecular classifications. CONCLUSION: We have established a stacked ensemble model using cfDNA fragmentomics features and achieved superior sensitivity for detecting early-stage breast cancer, which could promote early diagnosis and benefit more patients.