Abstract
To evaluate the noninvasive, early identification capability of urine volatile organic compounds (VOCs) obtained via gas chromatography-ion mobility spectrometry (GC-IMS), combined with machine learning models, for gallstones complicated by cholecystitis. A single-center study enrolled 100 patients with gallstone-cholecystitis and 100 healthy controls (n = 200 total). Midstream urine samples were uniformly collected and stored at − 80 °C. GC-IMS acquired two-dimensional fingerprints, which underwent RIP normalization and manual peak quality control. Unreliably identified peaks were excluded before modeling. The data were randomly divided into training (70%) and testing (30%) sets. Feature selection was performed on the training set to construct Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN), and Decision Tree (DT) models. These models were optimized using 10-fold cross-validation and evaluated on the testing set, with Area Under the ROC Curve (AUC) as the primary metric. Simultaneously, model performance and single biomarker performance are evaluated based on leading compounds (e.g., Linalool, Propyl-propenyl disulphide, Methylthiobutyrate-M, Butylamine). On the test set, RF, SVM, and NN achieved AUC values of 0.905, 0.887, and 0.870 respectively, demonstrating overall superior discrimination compared to DT (AUC = 0.658). For the small model constructed using the aforementioned four VOCs, NN, RF, and SVM yielded AUC values of 0.81, 0.77, and 0.76 respectively; Regarding individual markers, Linalool (AUC = 0.777), Propyl-propenyl disulphide (AUC = 0.768), Methylthiobutyrate-M (AUC = 0.768), and Butylamine (AUC = 0.731) all demonstrated certain discriminatory capabilities. The combination of urine VOCs-GC-IMS with machine learning demonstrates favorable discriminatory performance in the early, non-invasive identification of gallstones and cholecystitis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-36709-6.