Pilot Evaluation of a Deep Learning Model for Nasogastric Tube Verification on Chest Radiographs: A Single-Center Retrospective Study

基于深度学习模型的鼻胃管胸部X光片验证的初步评估:一项单中心回顾性研究

阅读:1

Abstract

BACKGROUND: Accurate confirmation of nasogastric (NG) tubes is essential for patient safety, but delays and variability in interpretation remain common in clinical practice. Deep learning (DL) models have shown potential for assisting in this task, but real-world performance, particularly in detecting malpositioned tubes, remains insufficiently characterized. METHODS: We conducted a pilot evaluation of a previously developed DL model using 135 chest radiographs from Kangwon National University Hospital. Expert physicians established the reference standard. Model performance was assessed and receiver operating characteristic (ROC) curve and precision recall curve (PRC) analyses were performed. Differences between correctly classified and misclassified cases were examined using Wilcoxon rank-sum and Fisher's exact tests to explore potential clinical or radiographic contributors to model failure. RESULTS: The model correctly classified 129 of 135 cases. The sensitivity was 96.1% (95% confidence interval (CI): 92.2-98.9%), specificity was 85.7% (95% CI: 42.2-97.7%), positive predictive value (PPV) was 99.2% (95% CI: 96.1-99.9%), negative predictive value (NPV) was 54.5% (95% CI: 25.4-80.8%), balanced accuracy was 90.8%, and F1-score was 0.976. The area under the ROC curve was 0.970 (95% CI: 0.929-1.000) and that under the PRC was 0.727 (95% CI: 0.289-1.000), reflecting substantial uncertainty related to the very small number of incomplete cases (n = 6). No statistically significant differences in clinical or radiographic characteristics were observed between correctly classified and misclassified cases. CONCLUSIONS: The DL model performed well in identifying correctly positioned NG tubes but demonstrated limited and unstable performance for detecting incomplete placements. Given the safety implications of misclassification, the model should be used only as an assistive tool with mandatory physician oversight. Larger, multi-center studies with greater representation of incomplete cases are required to obtain more reliable estimates and support safe clinical implementation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。