Abstract
Gastrointestinal cancers account for roughly a quarter of global cancer incidence, and early detection through endoscopy has proven effective in reducing mortality. Multi-class endoscopic disease detection, however, faces three persistent challenges: feature redundancy from non-pathological content, severe illumination inconsistency across imaging modalities, and extreme scale variability with blurry boundaries. This paper introduces Endo-DET, a domain-specific detection framework addressing these challenges through three synergistic components. The Adaptive Lesion-Discriminative Filtering (ALDF) module achieves lesion-focused attention via sparse simplex projection, reducing complexity from O(N2) to O(αN2). The Global-Local Illumination Modulation Neck (GLIM-Neck) enables illumination-aware multi-scale fusion through four cooperative mechanisms, maintaining stable performance across white-light endoscopy, narrow-band imaging, and chromoendoscopy. The Lesion-aware Unified Calibration and Illumination-robust Discrimination (LUCID) module uses dual-stream reciprocal modulation to integrate boundary-sensitive textures with global semantics while suppressing instrument artifacts. Experiments on EDD2020, Kvasir-SEG, PolypGen2021, and CVC-ClinicDB show that Endo-DET improves mAP50-95 over the DEIM baseline by 5.8, 10.8, 4.1, and 10.1 percentage points respectively, with mAP75 gains of 6.1, 10.3, 6.8, and 9.3 points, and Recall50-95 improvements of 10.9, 12.1, 11.1, and 11.5 points. Running at 330 FPS with TensorRT FP16 optimization, Endo-DET achieves consistent cross-dataset improvements while maintaining real-time capability, providing a methodological foundation for clinical computer-aided diagnosis.