Abstract
In recent years, UAV-based multispectral object detection has demonstrated tremendous potential in smart city traffic management and disaster response capabilities. However, most existing methods focus on better aligning or fusing the two images, while neglecting the blurring of object edges in infrared images under adverse conditions, making it more challenging to distinguish between the foreground and background, thus increasing the difficulty of object detection.To address this issue, we propose a novel cross-modal edge-enhanced detector for UAV-based multispectral object detection.First, we design an Edge Feature Enhancement Module that uses differential convolution to compute the difference between the weighted fusion pooling layer and the input feature map, thereby enhancing edge features in the images.Next, we design a Multi-Scale Feature Fusion Module that employs dilated convolution to expand the receptive field by inserting gaps between kernel elements without increasing the kernel size.This enables the model to detect objects of varying sizes and adapt to resolution changes caused by UAV flight dynamics. Finally, we introduce a Cross-Modal Feature Fusion Module that leverages a self-attention mechanism to learn, adjust, and fuse complementary information from both modalities, enhancing the model, s robustness and improving feature representation across the two spectra.CMEE-Det outperforms existing methods on the DroneVehicle dataset and two other multimodal object detection datasets.