Enhanced Object Detection in Thangka Images Using Gabor, Wavelet, and Color Feature Fusion

基于Gabor、小波和颜色特征融合的唐卡图像目标检测增强

阅读:1

Abstract

Thangka image detection poses unique challenges due to complex iconography, densely packed small-scale elements, and stylized color-texture compositions. Existing detectors often struggle to capture both global structures and local details and rarely leverage domain-specific visual priors. To address this, we propose a frequency- and prior-enhanced detection framework based on YOLOv11, specifically tailored for Thangka images. We introduce a Learnable Lifting Wavelet Block (LLWB) to decompose features into low- and high-frequency components, while LLWB_Down and LLWB_Up enable frequency-guided multi-scale fusion. To incorporate chromatic and directional cues, we design a Color-Gabor Block (CGBlock), a dual-branch attention module based on HSV histograms and Gabor responses, and embed it via the Color-Gabor Cross Gate (C2CG) residual fusion module. Furthermore, we redesign all detection heads with decoupled branches and introduce center-ness prediction, alongside an additional shallow detection head to improve recall for ultra-small targets. Extensive experiments on a curated Thangka dataset demonstrate that our model achieves 89.5% mAP@0.5, 59.4% mAP@[0.5:0.95], and 84.7% recall, surpassing all baseline detectors while maintaining a compact size of 20.9 M parameters. Ablation studies validate the individual and synergistic contributions of each proposed component. Our method provides a robust and interpretable solution for fine-grained object detection in complex heritage images.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。