Correction: 'Do neonatal intensive care unit (NICU) health workers know about retinopathy of prematurity (ROP)? A qualitative study at a Regional Referral Hospital in Uganda'

更正:'新生儿重症监护室(NICU)医护人员是否了解早产儿视网膜病变(ROP)?乌干达一家区域转诊医院的定性研究'

阅读:1

Abstract

BACKGROUND: Pretrained foundation models are increasingly adopted for diabetic retinopathy (DR) screening, yet it remains unclear how much of their performance derives from the learned representations versus the adaptation procedure. Most benchmarks report discrimination metrics alone, neglecting probability calibration. METHODS: We compared the frozen representations of three pretrained encoders: MedSigLIP (medical vision–language; ViT-B/16, 448 × 448), RETFound (retinal self-supervised; ViT-L/16, 224 × 224), and EfficientNet-B0 (ImageNet-supervised; 224 × 224). All encoder weights were frozen; only an identical lightweight multilayer perceptron head was trained. Models were developed on APTOS 2019 (3,662 fundus images; five-fold cross-validation) and externally validated on MESSIDOR-2 (1,744 images). Binary referable DR detection and five-class severity grading were evaluated. AUC, expected calibration error (ECE), and Brier score served as co-primary endpoints. External-set tests used patient-level cluster-robust bootstrap to account for bilateral correlation. RESULTS: On the development set, all three encoders achieved near-identical binary AUC (0.980–0.985). MedSigLIP showed superior calibration, with a lower Brier score than RETFound (0.044 vs. 0.049; p = 0.030) and EfficientNet-B0 (0.044 vs. 0.052; p = 0.006). External validation on MESSIDOR-2 revealed divergence: MedSigLIP maintained an AUC of 0.915 (drop 0.070), whereas RETFound fell to 0.697 (drop 0.286) and EfficientNet-B0 to 0.745 (drop 0.236). Retina-specific RETFound performed below the ImageNet baseline (ΔAUC = −0.051; p = 0.016, cluster-robust bootstrap). For five-class grading, MedSigLIP achieved an external macro-F1 of 0.450 versus 0.247 (RETFound) and 0.291 (EfficientNet-B0). Temperature scaling reduced development ECE to 0.014–0.022 but proved ineffective under domain shift (external ECE 0.086–0.149). All encoders exhibited catastrophic failure on mild DR (grade 1) externally, with RETFound and EfficientNet-B0 achieving F1 = 0.000 and MedSigLIP only 0.153. CONCLUSION: Under frozen transfer, the MedSigLIP encoder package produced more generalisable and better calibrated representations than both retinal self-supervised (RETFound) and ImageNet-supervised (EfficientNet-B0) encoders. Domain-specific pretraining did not guarantee domain-general frozen representations. These findings demonstrate that development-set discrimination alone is insufficient for encoder evaluation and that calibration metrics—particularly the Brier score—should be reported as standard practice.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。