How threshold customisation affects the performance of a multiclass X-ray AI model for primary care triage: a retrospective study

阈值定制如何影响用于初级保健分诊的多类别X射线AI模型的性能:一项回顾性研究

阅读:3

Abstract

OBJECTIVES: To describe the structured process of threshold optimisation for a commercially available multiclass chest X-ray (CXR) deep learning model, to evaluate its diagnostic performance across different operating thresholds, and to estimate its potential operational impact within an artificial intelligence (AI)-enabled triage workflow in a primary care setting. DESIGN: Retrospective diagnostic performance evaluation with threshold-based analysis. SETTING: Primary care radiography services in Singapore, using data derived from two primary care clinics and a tertiary hospital. PARTICIPANTS: A total of 816 adult frontal chest radiographs were included (multiethnic Asian, 464 males, 352 females; mean age 60.8 years). Images were selected to represent the spectrum of findings often encountered in primary care. Exclusion criteria included paediatric studies, lateral or oblique radiographs, and findings not supported by the AI model (eg, bony abnormalities and medical devices). PRIMARY AND SECONDARY OUTCOME MEASURES: Primary outcome measures were sensitivity, specificity, and negative and positive predictive value (NPV and PPV). Secondary outcomes included estimated potential operational improvement, which is calculated by dividing the number of true negatives by the total number of CXRs. RESULTS: At the default threshold of 0.15, the AI model achieved a sensitivity of 87.3% (95% CI 83.9% to 90.4%) and an NPV of 87.0% (95% CI 83.6% to 90.2%). Lowering the threshold to 0.10 increased sensitivity to 93.2% (95% CI 90.7% to 95.5%) and NPV to 91.3% (95% CI 88.2% to 94.3%), with specificity of 71.7% (95% CI 67.3% to 76.1%). These trade-offs were considered acceptable for a safety-focused co-triage workflow prioritising minimisation of false negatives. CONCLUSIONS: Threshold optimisation is critical for adapting AI models to context-specific clinical workflows. Our study shows that adjusting the operating threshold enabled prioritisation of sensitivity and NPV, supporting safe AI-assisted triage in primary care. This is a deeply collaborative process that must involve radiology and clinical teams: selecting appropriate thresholds aligned with clinical objectives for safe and effective implementation. Future work will assess real-world operational impact and user acceptance following prospective deployment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。