Abstract
BACKGROUND: Accurate preoperative risk assessment remains critical in hepatobiliary surgery. Established prediction models, such as POSSUM and P-POSSUM, have shown variable performance when applied to specialized procedures. This study externally validated and recalibrated both models to predict postoperative morbidity and mortality after elective hepatic resection. METHODS: All consecutive adult patients who underwent elective hepatic resection at the University Hospital Regensburg between December 2020 and December 2023 were retrospectively analyzed. POSSUM and P-POSSUM scores were calculated using the original logistic equations. Major morbidity (Clavien–Dindo ≥ IIIa) and in-hospital mortality were the predefined outcomes. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC), and calibration was evaluated using the Brier score, calibration-in-the-large (intercept), calibration slope, and out-of-bag (OOB) calibration plots derived from 1,000 bootstrap resamples. Logistic recalibration was applied to adjust the model intercepts (α) and slopes (β). The clinical utility was evaluated using decision curve analysis. RESULTS: Of the 200 elective hepatectomies assessed, six were excluded due to missing required physiological inputs, yielding 194 patients with computable predictions. Clinically relevant morbidity (Clavien–Dindo ≥ II) occurred in 146/194 (75.3%) patients, major morbidity (≥ IIIa) in 73/194 (37.6%), and in-hospital mortality in 15/194 (7.7%). Discrimination was fair for morbidity and higher for mortality: AUC 0.696 (95% CI 0.595–0.789) for clinically relevant morbidity, AUC 0.697 (95% CI 0.620–0.764) for major morbidity, and AUC 0.755 (95% CI 0.647–0.851) for in-hospital mortality. OOB bootstrap calibration showed slopes below 1 for all endpoints (clinically relevant morbidity: α 0.16, β 0.837, Brier 0.172; major morbidity: α − 0.051, β 0.907, Brier 0.215; mortality: α − 0.34, β 0.843, Brier 0.068), supporting the need for local model updating. CONCLUSION: POSSUM and P-POSSUM can support perioperative risk prediction after hepatic resection when they are locally recalibrated and internally validated. Bootstrap-corrected recalibration yielded stable performance without evidence of overfitting, and decision curve analysis suggested clinical utility across relevant threshold probabilities. These findings support the use of POSSUM-based models in hepatobiliary surgery, provided that centers perform local validation and model updating before implementation in clinical decision-making. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12893-026-03508-9.