Abstract
Multiple object tracking (MOT) is a critical and active research topic in computer vision, serving as a fundamental technique across various application domains such as human-robot interaction, autonomous driving, and surveillance. MOT typically consists of two key components: detection, which produces bounding boxes around objects, and association, which links current detections to existing tracks. Two main approaches have been proposed: one-shot and two-shot methods. While previous works have improved MOT systems in terms of both speed and accuracy, most works have focused primarily on enhancing association performance, often overlooking the impact of accelerating detection. Thus, we propose a high-speed MOT system that balances real-time performance, tracking accuracy, and robustness across diverse environments. Our system comprises two main components: (1) a hybrid tracking framework that integrates low-frequency deep learning-based detection with classical high-speed tracking, and (2) a detection label-based tracker management strategy. We evaluated our system in six scenarios using a high-speed camera and compared its performance against seven state-of-the-art (SOTA) two-shot MOT methods. Our system achieved up to 470 fps when tracking two objects, 243 fps with three objects, and 178 fps with four objects. In terms of tracking accuracy, our system achieved the highest MOTA, IDF1, and HOTA scores with high-accuracy detection. Even with low detection accuracy, it demonstrated the potential of long-term association for high-speed tracking, achieving comparable or better IDF1 scores. We hope that our multi-processing architecture contributes to the advancement of MOT research and serves as a practical and efficient baseline for systems involving multiple asynchronous modules.