HEViTPose: towards high-accuracy and efficient 2D human pose estimation with cascaded group spatial reduction attention

HEViTPose:基于级联群组空间缩减注意力机制的高精度高效二维人体姿态估计

阅读:1

Abstract

Transformer-based human pose estimation methods have made encouraging progress in improving performance. However, the excellent performance of pose networks is often accompanied by heavy computational costs and large network scale. In order to deal with this problem, this paper proposes a High-accuracy and Efficient Vision Transformer for Human Pose Estimation (HEViTPose). Firstly, the concept of Patch Embedded Overlap Width (PEOW) is proposed to help understand the relationship between the amount of overlap and local continuity. By explicitly adjusting PEOW value, the model’s capacity to capture local continuity information is enhanced. Secondly, a Cascaded Group Spatial Reduction Multi-Head Attention (CGSR-MHA) is proposed, which improves memory efficiency through feature grouping, reduces computational cost through spatial reduction, and also improves network performance by retaining multiple low-dimensional attention heads. Finally, comprehensive experiments on two benchmark datasets (MPII and COCO) demonstrate that the HEViTPose model performs on par with the state-of-the-art models, but is more lightweight while possessing higher inference speed. Specifically, compared with HRNet with similar performance and inference speed, the proposed model reduces the number of parameters by 62.1% and the amount of computation by 43.4%. Compared with HRFormer with similar performance and network size, the inference speed is about 2.6 times faster. Code and models are available at https://github.com/ T1sweet/HEViTPose.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。