Abstract
The deployment of Total Laboratory Automation (TLA) systems in medical production lines faces challenges including spatial constraints, dense object distributions, and severe occlusions, rendering traditional detection methods inadequate. This paper proposes POLAR-DETR (Polarized Occlusion-aware Local-global Attention Real-time Detection Transformer), an efficient real-time end-to-end detection framework for medical production scenarios. First, we design a Polarized Occlusion-aware Hierarchical Feature Encoder (POHFE) incorporating polar linear attention and dynamic nonlinear feature modulation, enhancing spatial-contextual awareness and detail representation. Second, we introduce a Multi-level Hierarchical Attention Fusion (MHAF) module that strengthens semantic associations between multi-scale features through hypergraph computation. Additionally, we develop a Hierarchical Dual-branch Attention Fusion (HDAF) module for precise discrimination of local details and global information. To optimize deployment efficiency, we devise a Hessian matrix-based pruning strategy reducing network redundancy. Furthermore, we construct the Augmented Medical Production Line (AMPL) dataset, comprising 5040 high-resolution images with 85,797 annotated instances. Experimental results demonstrate that POLAR-DETR achieves 70.0% Average Precision (AP) on AMPL while maintaining 68.4 FPS. Compared to baseline, our approach improves AP by 4.7% while reducing parameters and computational complexity by 20.5% and 22.6% respectively, providing an efficient visual detection solution for medical production automation.