The Impact of Evaluation Strategy on Sepsis Prediction Model Performance Metrics in Intensive Care Data: Retrospective Cohort Study

评估策略对重症监护数据中脓毒症预测模型性能指标的影响：回顾性队列研究

阅读：1

作者：Do,Dang-Khoa,Rockenschaub,Patrick,Boie,Sebastian Daniel,Kumpf,Oliver,Volk,Hans-Dieter,Balzer,Felix,von Dincklage,Falk,Lichtner,Gregor

期刊：	Journal of Medical Internet Research	影响因子：	6.000
时间：	2026	起止号：	2026 Mar 24;28:e72083
doi：	10.2196/72083

Abstract

BACKGROUND: The prediction of the onset of sepsis, a life-threatening condition resulting from a dysregulated response to an infection, is one of the most common prediction tasks in intensive care-related machine learning research. To assess the performance of such models, different evaluation strategies, including fixed horizon (a single prediction at a set time before onset), peak score (a single prediction using the maximum predicted risk across time), and continuous evaluation (multiple predictions assessed continuously across time), are commonly implemented, but there is no clear consensus on which approach should be used in order to provide clinically meaningful performance evaluation. OBJECTIVE: This study aimed to assess different evaluation approaches of sepsis prediction models trained on a public intensive care dataset applied to German intensive care data. METHODS: In this retrospective, observational cohort study, we assessed the efficacy of machine learning models, pretrained on the Medical Information Mart for Intensive Care IV dataset, when applied to BerlinICU, a multisite German intensive care dataset. To understand the real-world impact of implementing these models, we examined the performance variability across various evaluation strategies. RESULTS: The BerlinICU dataset includes 40,132 intensive care admissions spanning 10 years (2012-2021). Using the latest Sepsis-3 definition, we identified 4134 septic admissions (10.3% prevalence). Application of a temporal convolutional network model to BerlinICU yielded an area under the receiver operating characteristic curve (AUROC) of 0.67 (95% CI 0.66-0.68) for continuous evaluation with a 6-hour prediction horizon, compared with 0.84 (95% CI 0.83-0.85) on the test set of Medical Information Mart for Intensive Care IV. On BerlinICU, peak score evaluation showed a similar AUROC compared with continuous evaluation, while fixed horizon evaluation showed a reduced AUROC of 0.61 (95% CI 0.60-0.62). Onset matching had minimal impact on performance estimates using continuous evaluation or fixed horizon evaluation, but increased estimates for peak score evaluation. Performance metrics improved with shorter prediction horizons across all strategies. CONCLUSIONS: Our results demonstrate that the choice of evaluation strategy has a significant impact on the performance metrics of intensive care prediction models. The same model applied to the same dataset yields markedly different performance metrics depending on the evaluation approach. Therefore, careful selection of the evaluation approach is essential to ensure that the interpretation of performance metrics aligns with clinical intentions and enables meaningful comparisons between studies. In our view, the continuous evaluation approach best reflects the continual monitoring of patients that is performed in real-world clinical practice. In contrast, fixed-horizon and peak score evaluation approaches may produce skewed results when not properly matching the length of stay distributions between sepsis cases and control cases. Especially for peak score evaluation, longer visits tend to produce higher maximum scores because sampling from more values increases the likelihood of capturing higher values purely by chance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞