Abstract
Accurate emotion recognition is essential for driver-state assessment and mental-health monitoring. Expression-based methods are often unreliable across individuals, and attention has increasingly turned to physiological signals. However, single-modality sensing, despite its precision, cannot provide a comprehensive representation of emotional states. To address this limitation, a multimodal approach was proposed that combines a polymer optical fiber (POF) cardiorespiratory sensor with a thermal image temperature sensor. The POF cardiorespiratory sensor monitors thoracic expansion to derive cardiorespiratory signals, while the thermal image temperature sensor provides high-sensitivity infrared measurements of facial temperature. Under a video-based emotion-induction protocol, effective features were extracted from cardiorespiratory signals and facial thermal time series. These features were fused into a 42-dimensional vector to represent the physiological patterns during emotion fluctuations. Feature-level fusion was evaluated using support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF) classifiers within a nested cross-validation framework to obtain unbiased generalization estimates. Compared with single-modality baselines, multimodal fusion reduced classification error and achieved peak accuracies of 93% (SVM) under feature selection. These results indicate that integrating portable POF cardiorespiratory sensing with thermal imaging offers a robust and generalizable approach to emotion recognition.