Abstract
Anastomotic leak (AL) remains a major cause of postoperative morbidity after colorectal resection. Postoperative C-reactive protein (CRP) thresholds have not been widely adopted as a predictive tool for AL, with recent studies assessing the predictive accuracy of the CRP trajectory. This pilot study evaluated machine learning (ML) models using postoperative CRP thresholds and trajectory data to predict AL, compared to conventional univariate CRP metrics. A retrospective analysis of elective large bowel resections (2020-2025) from a single-institution database was performed. AL within the index admission or 30-day follow-up was diagnosed radiologically or intraoperatively. CRP levels from postoperative days (PODs) 1-3 were used to derive absolute values, percentage change, and net difference. Logistic regression and ML models - Lasso, Random Forest (RF), and Extreme Gradient Boosting (XGB) were trained with five-fold cross-validation. Metrics included area under the ROC curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). A total of 679 patients were included; AL occurred in 2.36% (16/679). XGB using absolute CRP values and percentage change demonstrated the highest performance (AUC 0.91, sensitivity 0.88, specificity 0.86, NPV 0.98). XGB modelling achieved comparable accuracy with absolute CRP values and net difference feature sets (AUC 0.90, sensitivity 0.81, specificity 0.93, NPV 0.97). Integrated threshold and trajectory feature set models outperformed exclusive absolute CRP value models (RF: AUC 0.84, sensitivity 0.69, specificity 0.92, NPV 0.95). Univariate logistic regression showed inferior discriminatory performance: POD 2 CRP ≥108 mg/L yielded AUC 0.87 (sensitivity 0.94, specificity 0.69, NPV 0.99), while POD 1→2 net increase ≥94 mg/L gave AUC 0.74 (sensitivity 0.56, specificity 0.94, NPV 0.94). ML models integrating CRP threshold and trajectory data improved predictive accuracy for AL compared to traditional CRP thresholds. XGB and RF outperformed Lasso modelling, with balanced sensitivity and specificity. This highlights the value of dynamic, data-driven analysis in early postoperative risk stratification. Multi-centre analysis is planned for external validation and to confirm generalisability.