ScaleFormer architecture for scale invariant human pose estimation with enhanced mixed features

ScaleFormer架构用于具有增强混合特征的尺度不变人体姿态估计

阅读:1

Abstract

Human pose estimation is a fundamental task in computer vision. However, existing methods face performance fluctuation challenges when processing human targets at different scales, especially in outdoor scenes where target distances and viewing angles frequently change. This paper proposes ScaleFormer, a novel scale-invariant pose estimation framework that effectively addresses multi-scale pose estimation problems by innovatively combining the hierarchical feature extraction capabilities of Swin Transformer with the fine-grained feature enhancement mechanisms of ConvNeXt. We design an adaptive feature representation mechanism that enables the model to maintain consistent performance across different scales. Extensive experiments on the MPII human pose dataset demonstrate that ScaleFormer significantly outperforms existing methods on multiple metrics including PCKh, scale consistency score, and keypoint mean average precision. Notably, under extreme scaling conditions (scaling factor 2.0), ScaleFormer's scale consistency score exceeds the baseline model by 48.8 percentage points. Under 30% random occlusion conditions, keypoint detection accuracy improves by 20.5 percentage points. Experiments further verify the complementary contributions of the two core components. These results indicate that ScaleFormer has significant advantages in practical application scenarios and provides new research directions for the field of pose estimation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。