Abstract
BACKGROUND: The accurate preoperative grading of gliomas is critical for guiding therapeutic strategies and prognostic assessment. This study aimed to develop and validate a robust, interpretable machine learning (ML) model based on multicenter magnetic resonance imaging (MRI) radiomics data for predicting the World Health Organization (WHO) grade in glioma patients. METHODS: We collected MRI data from 905 glioma patients diagnosed between 2005-2024 across three independent cohorts. Data from 329 of The Cancer Genome Atlas (TCGA) patients served as training (n=230) and internal testing (n=99) sets, while data from 482 University of California San Francisco (UCSF) patients and 94 Nantong University Affiliated Hospital (NTUA) patients served as external validation sets. Radiomics features were extracted from preoperative T1-weighted contrast-enhanced MRI. Ten ML methods, including extreme gradient boosting (XGBoost), were compared using recursive feature elimination (RFE) with cross-validation. The SHapley Additive exPlanation (SHAP) method was used for the model interpretability analysis. RESULTS: Among the 10 ML models, XGBoost performed best with area under the curve (AUC) values of 0.983 [95% confidence interval (CI): 0.968-0.996] in the training set, 0.897 (95% CI: 0.836-0.956) in the internal testing set, and 0.834 (95% CI: 0.766-0.883) in the UCSF cohort and 0.880 (95% CI: 0.771-0.974) in the NTUA cohort. The model achieved an accuracy of 82.3-94.3%, significantly outperforming conventional imaging assessment. The calibration analysis showed excellent agreement (Hosmer-Lemeshow P>0.05) with a maximum net benefit of 0.42. The SHAP analysis identified 12 optimal features, primarily texture heterogeneity measures from Laplacian of Gaussian (LoG) and wavelet transforms, as key predictors of glioma grade. CONCLUSIONS: We successfully developed and validated a robust, interpretable radiomics-based model that accurately predicts glioma WHO grade preoperatively. Its promising performance across diverse datasets suggests potential for clinical translation, though prospective validation in real-world clinical workflows is required to confirm clinical utility and assess impact on treatment decisions and patient outcomes.