Abstract
Multi-object tracking is a challenging computer vision task that is a research hotspot in the literature. Although current one-stage methods can jointly optimize detection and appearance embedding models through an end-to-end approach, they still face major challenges. These include high computational demands, difficulty in distinguishing similar objects and poor performance in reidentifying lost objects. To overcome these challenges, we propose a lightweight multi-object tracking method to enhance tracking efficiency through the dual attention mechanism. This mechanism, on the one hand, adopts an intra-sample local attention, enabling the model to focus on discriminative regions to extract instance context, thereby effectively distinguishing similar objects. On the other hand, it employs inter-sample global attention, which captures instance-level semantic information across samples, facilitating feature interaction between objects in different frames, thus enhancing the re-identification performance for lost objects. We validated the effectiveness of the proposed method with extensive experiments on publicly available MOT and our proposed STATION datasets, achieving comparable performance.