Abstract
BACKGROUND AND AIM: Idiopathic granulomatous mastitis (IGM) is a rare chronic inflammatory breast disease that presents significant challenges in diagnosis and treatment. Predicting the recurrence of IGM is crucial for effective patient management and improved treatment outcomes. This study aims to evaluate and compare the performance of different machine learning models, including logistic regression, random forest, and neural networks, in predicting IGM recurrence using patient data. METHODS: A retrospective analysis was conducted on 212 patients diagnosed with IGM. Collected data included comprehensive serological markers, tumor characteristics, and treatment history. The dataset was divided into a training set (70%) and a testing set (30%). Data preprocessing involved normalization, feature selection, and data augmentation to ensure model robustness. Three predictive models were developed and compared: logistic regression, random forest, and neural networks. Performance metrics such as accuracy, sensitivity, specificity, and area under the ROC curve (AUC) were used to evaluate each model's ability to predict IGM recurrence. RESULTS: The logistic regression model achieved an AUC of 0.837, 0.725 and 0.829 in the training cohort, validation cohort and test cohort. The random forest model showed improved performance with an AUC of 0.797, 0.755 and 0.793 in the training cohort, validation cohort and test cohort. The neural network model outperformed both the logistic regression and random forest models, with an AUC of 0.938, 0.880 and 0.913 and a better F1 score. Feature importance analysis indicated that variables such as smoking, surgery and a history of oral contraceptive use were most important in predicting recurrence. CONCLUSION: This study demonstrates that, compared to logistic regression and random forest models, neural networks have superior performance in predicting the recurrence of granulomatous mastitis. The high accuracy and reliability of the neural network model highlight its potential clinical application in the early and accurate prediction of IGM recurrence.