Abstract
The cognitive load of drivers directly affects the safety and practicality of advanced driving assistant systems, especially in autonomous driving scenarios where drivers need to quickly take control of the vehicle after performing non-driving-related tasks (NDRTs). However, existing driver cognitive load detection methods have shortcomings such as the inability to deploy invasive detection equipment inside vehicles and limitations to eye movement detection, which restrict their practical application. To achieve more efficient and practical cognitive load detection, this study proposes a multi-task non-contact cognitive load and physiological state estimation model based on RGB video, named CogMamba. The model utilizes multimodal features extracted from facial video and introduces the Mamba architecture to efficiently capture local and global temporal dependencies, thereby further jointly estimating cognitive load, heart rate (HR), and respiratory rate (RR). Experimental results demonstrate that CogMamba exhibits superior performance on two public datasets and shows excellent robustness under the cross-dataset generalization test. This study provides insights for non-contact driver state monitoring in real-world driving scenarios.