Abstract
Background: Gliomas are the most common primary malignant tumors of the central nervous system. Accurate preoperative grading is essential for individualized surgical planning and treatment selection; however, reliable non-invasive prediction tools integrating multimodal preoperative data remain limited. This study aimed to develop and internally validate an interpretable machine-learning model for non-invasive glioma grading. Methods: Clinical and imaging data from 400 patients with pathologically confirmed gliomas were retrospectively collected. Twenty-four preoperative variables were analyzed. The dataset was randomly divided into training and validation cohorts (7:3). Feature selection was performed using a combination of the Boruta algorithm and logistic regression analyses, followed by correlation filtering. Seventeen machine-learning algorithms were benchmarked using five-fold cross-validation, and the optimal model was evaluated in the independent validation cohort using ROC analysis, calibration assessment, precision–recall curves, and decision curve analysis. Model interpretability was examined using SHAP. Results: Eight key predictors were identified, including age, focal neurological deficits, midline shift, tumor laterality, tumor lobar location, enhancing tumor volume, and MRS-derived Cho/NAA and Cho/Cr ratios. The Random Forest model achieved an area under the ROC curve of 0.946 (95% CI: 0.902–0.989) in the validation cohort. Calibration analysis demonstrated reasonable agreement between predicted and observed outcomes, and the precision–recall curve yielded an average precision of 0.98. Decision curve analysis indicated net clinical benefit across relevant probability thresholds. Conclusions: A multimodal machine-learning model integrating clinical, structural imaging, and MRS-derived metabolic features was developed and internally validated for non-invasive preoperative glioma grading. The model showed good discrimination and calibration and provided individualized probability estimates, suggesting potential value for preoperative risk stratification. However, clinical deployment remains premature, and further external validation is required.