Abstract
BACKGROUND: Accurate preoperative risk identification is critical for patient safety and postoperative outcomes. Anaesthesiologists make decisions on the basis of ASA classification and additional parameters. Artificial intelligence (AI)-based decision support may offer more objective judgments. METHODS: In this retrospective multi-rater study, four anaesthesiologists and an AI system independently evaluated 1,000 cases. ASA class, postoperative ICU requirement, anaesthesia preference, intraoperative risk prediction, and additional recommendations were assessed. Concordance was analysed using Krippendorff’s alpha, Cohen’s kappa, Gwet’s AC2, and PABAK, with percentage agreement estimated by bootstrapping. AI–physician agreement was further examined using fixed-effects logistic regression including clinician sex and professional experience as covariates. RESULTS: Physician–physician agreement was generally good to excellent across outcomes, whereas physician–AI agreement was lower and variable when assessed using κ, PABAK, Gwet’s AC2, and observed agreement (Pₒ). The highest AI concordance was observed for intraoperative risk prediction and ICU requirement, while the lowest was for anaesthesia preference. Exploratory analyses suggested that AI–physician concordance may vary by clinician experience and sex; no significant effects of sex or experience were observed for intraoperative anaesthesia-related risk prediction. CONCLUSION: Although AI shows high concordance with physician decisions in objective/algorithmic domains, concordance remains limited in contextual and experience-based domains (anaesthesia preference). The findings support positioning AI as a safe ‘second eye/warning’ tool within human-in-the-loop workflows, rather than as an independent authority. Prospective, externally validated studies are needed. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-026-03399-z.