Abstract
Classroom behavior can effectively reflect learning states, and thus classroom behavior detection is crucial for improving teaching methods and enhancing teaching quality. To address issues such as severe occlusions and large scale variations in student behavior detection, this paper proposes a classroom behavior detection model, named YOLO-CBD (YOLOv10s Classroom Behavior Detection). Firstly, BiFormer attention is introduced to redesign the Efficientv2 network, leading to a novel backbone network for efficient feature extraction of student classroom behaviors. The proposed attention module enables accurate localization in densely populated student settings. Secondly, a novel feature aggregation module is designed for replacing the basic C2f module in the YOLOv10s neck network and enhancing the capability to detect occluded targets effectively. Additionally, a feature pyramid network with efficient feature fusion is constructed to address inconsistencies among features of different scales. Finally, the Wise-IoU loss function is incorporated to handle sample imbalance issues. Experimental results show that, compared to the baseline model, YOLO-CBD improves precision by 5.7%, recall by 3.7%, and mAP50 by 3.5%, achieving effective classroom behavior detection.