A novel deep learning approach for mosquito species classification via a dual-head structure and calibration-aware fusion architecture

一种基于双头结构和校准感知融合架构的新型深度学习蚊子物种分类方法

阅读:1

Abstract

Accurate mosquito species recognition underpins vector surveillance and targeted control, yet field imagery suffers from device variability, clutter, and fine-grained inter-species similarity. Deep learning has emerged as a scalable path, but prior systems often lack calibrated probabilities and degrade under domain shift. We propose a dual-head architecture that aligns an 8-class head with an auxiliary 8 to 2 Aedes head to sharpen difficult boundaries, and we fuse heterogeneous CNN/Transformer branches via calibrated logit stacking followed by temperature scaling (specifically, a CNN backbone paired with a Swin-T Transformer branch to capture complementary local texture and long-range morphology). With test-time augmentation (TTA, 5–8 views), the pipeline jointly reduces variance, corrects bias, and improves posterior calibration. We evaluate on AMID v1 (8-class, whole-body images) and on an unseen, phone-style Aedes corpus used strictly as test-only to probe cross-dataset generalization. Against strong baselines (ResNet-50, EfficientNet-V2-S) and naïve probability averaging, our method attains near-ceiling in-domain performance—Macro-F1 ≈ 99.3–99.4% and Micro-Accuracy ≈ 99.4–99.5%—and exceeds 99% accuracy on the unseen Aedes set, while markedly improving calibration (ECE ≈ 0.6%). Confidence intervals (Wilson, 95%) and paired tests (McNemar) indicate that these gains, though incremental, are consistent and statistically reliable. Ablations show that TTA = 5 with calibrated stacking captures most benefits at practical latency. By coupling boundary-aware supervision with calibration-aware fusion, the proposed approach delivers predictions that are both more correct and more trustworthy, stabilizing operating thresholds across sites and capture pipelines —with the Swin-T branch contributing robustness to pose and device variation through its windowed self-attention. This provides a deployment-ready baseline for public-health monitoring and a principled foundation for future extensions to open-set recognition, domain-aware calibration, and multimodal sensing. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-35453-1.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。