Abstract
A novel feature fusion module, named TriNeXt, is proposed to enhance multi-scale representations in object detection frameworks. TriNeXt integrates local, nested, and global context-aware pathways, enabling effective spatial and semantic feature enrichment. The module is seamlessly incorporated into (You Only Look Once (YOLO)) YOLOv5 s and evaluated under various configurations. Extensive experiments on the Cityscapes dataset demonstrate the superiority of TriNeXt. Compared to baseline detectors, the full version (TriNeXt-Full) consistently improves detection performance: mean Average Precision (mAP)@0.5 increases from 61.7% (YOLOv5 s) and 62.3% (YOLOv8 s) to 63.2%; recall improves from 54.5%(YOLOv5 s) and 54.9%(YOLOv8 s) to 55.9%; precision rises from 77.2%(YOLOv5 s) and 75.8%(YOLOv8 s) to 79.2%. Despite introducing slight computational overhead, TriNeXt-Full maintains real-time inference speed, achieving a favorable balance between detection accuracy and efficiency. Progressive evaluations of TriNeXt variants further confirm the effectiveness of each design component, particularly in enhancing small object detection. To further validate its generalization capability, TriNeXt is evaluated on the KITTI dataset, where TriNeXt-Full achieves an improvement in mAP@0.5 from 94.0% to 95.0%, recall from 87.3% to 90.1%, and precision from 95.2% to 95.7%, demonstrating consistent performance gains across diverse and complex urban driving scenarios. These results establish TriNeXt as a robust and versatile module for small object detection in real-world applications.