Abstract
INTRODUCTION: Monitoring disease activity in inflammatory bowel disease (IBD) is essential for guiding therapy and preventing irreversible tissue damage. Colonoscopy, although the gold standard, is invasive and unsuitable for frequent monitoring, while fecal calprotectin lacks accuracy within its diagnostic gray zone (fecal calprotectin 100-250 μg/g). Stool proteomics offers a non-invasive alternative by directly capturing molecular signatures of intestinal inflammation. We conducted a proof-of-concept study to determine whether stool-derived peptides can accurately classify IBD activity (Active vs. Remission) using a fully unbiased and reproducible nested cross-validation machine-learning framework. METHODS: A total of 174 stool samples from IBD patients were collected and profiled using SWATH-DIA mass spectrometry. Feature selection was performed within the training loops only (Boruta, LASSO, RFE) across repeated subsampling, retaining peptides consistently identified in ≥70% of runs. Stable features were used to train four classifiers (GLMNet, SVM-Radial, SVM-Linear, Naïve Bayes) under inner 5-fold tuning. Outer test folds provided fully unseen evaluation, and model performance was additionally assessed exclusively on gray zone samples extracted from the outer test splits to quantify diagnostic resolution in this clinically challenging subgroup. RESULTS: Nested cross-validation identified a consensus panel of nine stool-derived peptides from five proteins. Across candidate classifiers, performance was broadly similar, with GLMNet consistently achieving the best trade-off between metrics. For GLMNet, outer-fold mean AUC was 0.93 and balanced accuracy 0.88, with specificity 0.94, sensitivity 0.82, and F1-score 0.85; close agreement between inner- and outer-fold metrics indicated minimal overfitting. Within the calprotectin gray zone subgroup (n = 34), GLMNet maintained good performance (balanced accuracy 0.78, F1 0.79, AUC 0.80), confirming that the peptide signature remains informative in this diagnostically challenging range. CONCLUSION: A stool-based multi-peptide signature, evaluated with a rigorously nested, leakage-free machine-learning framework, can reliably classify IBD activity and retain discriminative power within the gray zone. This biologically interpretable five-protein panel provides a strong basis for targeted mass-spectrometry assay development and prospective validation as a non-invasive tool for personalized IBD monitoring.