Query-Based Object Visual Tracking with Parallel Sequence Generation

基于查询的并行序列生成对象视觉跟踪

阅读:1

Abstract

Query decoders have been shown to achieve good performance in object detection. However, they suffer from insufficient object tracking performance. Sequence-to-sequence learning in this context has recently been explored, with the idea of describing a target as a sequence of discrete tokens. In this study, we experimentally determine that, with appropriate representation, a parallel approach for predicting a target coordinate sequence with a query decoder can achieve good performance and speed. We propose a concise query-based tracking framework for predicting a target coordinate sequence in a parallel manner, named QPSTrack. A set of queries are designed to be responsible for different coordinates of the tracked target. All the queries jointly represent a target rather than a traditional one-to-one matching pattern between the query and target. Moreover, we adopt an adaptive decoding scheme including a one-layer adaptive decoder and learnable adaptive inputs for the decoder. This decoding scheme assists the queries in decoding the template-guided search features better. Furthermore, we explore the use of the plain ViT-Base, ViT-Large, and lightweight hierarchical LeViT architectures as the encoder backbone, providing a family of three variants in total. All the trackers are found to obtain a good trade-off between speed and performance; for instance, our tracker QPSTrack-B256 with the ViT-Base encoder achieves a 69.1% AUC on the LaSOT benchmark at 104.8 FPS.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。