Soft-label guided stacked dual attention network for head pose estimation and its application to classroom gaze analysis

基于软标签引导的堆叠式双重注意力网络用于头部姿态估计及其在课堂注视分析中的应用

阅读:1

Abstract

Head pose estimation is a fundamental task in the field of computer vision, serving as an effective method to roughly determine a person's gaze direction. However, accurate head pose estimation remains a huge challenge due to occlusion and low resolution. To address this challenge, this paper proposes a novel framework that combines classification and regression paradigms for head pose estimation. To begin with, we design a novel soft-label generation strategy for classification. This strategy first generates 3D facial models from different angles and then measures the similarity between poses by utilizing the displacements of 3D key points from different views. Additionally, we introduce the Stacked Dual Attention Module (SDAM), which includes the Multi-Receptive Attention Module (MRAM) and the Channel-wise Self-Attention Module (CSAM). MRAM uses convolution kernels of different sizes and explore multiple contextual semantics to perceive key features. CSAM employs a self-attention mechanism to adaptively model inter-channel dependencies, achieving effective channel attention. The design of SDAM takes into account the characteristics of the task itself, enabling it to extract more representative features and to be easily deployed in mainstream network architectures (e.g., ResNet). Extensive experiments on popular datasets demonstrate the competitiveness of our method. Furthermore, we apply the proposed head pose method to approximate and estimate students' gaze points in large classroom scenarios.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。