Abstract
RATIONALE AND OBJECTIVE: To develop and validate the predictive value of (18)F-FDG PET/CT radiomics models based on data preprocessing methods for axillary lymph-node (ALN) status after neoadjuvant chemotherapy (NAC) for breast cancer. MATERIALS AND METHODS: According to the status of ALN after NAC, we divided the breast cancer patients of the three scanners into the pathological complete remission (pCR) and non-pCR groups, respectively. Totally 630 models were obtained based on various data preprocessing, feature filtering, and modeling approaches. On the one hand, different data preprocessing methods were compared to analyze the advantages of different preprocessing methods. On the other hand, the AUC of predicting ALN status was compared among all models, and the model with the best prediction was obtained. Finally, the optimal model is combined with the clinical and the corresponding Nomogram is plotted. RESULTS: The comparison of the data preprocessing modalities revealed that the model prediction of tumor-to-liver ratio (TLR) radiomics was better than origin radiomics (OR), and the effect of Combat and Limma was better than without batch effects. All preprocessing modalities could be used as a potential method that can further optimize the model. The optimal model had a predicted AUC of 0.798 for ALN status after NAC for breast cancer in the test set and an AUC of 0.811 when combined with clinical characteristics. CONCLUSION: It is necessary to pre-process the data before conducting a study on multicenter data, and the model developed in this way can effectively predict the status of ALN after NAC in breast cancer.