Abstract
Remote photoplethysmography (rPPG) enables contactless heart rate monitoring but remains vulnerable to motion and lighting changes. We address this by reframing heart rate estimation as a heartbeat detection problem, bypassing the need to reconstruct full blood volume pulse signals. Our approach, HBP-Net, predicts heartbeat probability directly from facial video using a spatiotemporal attention architecture, improving robustness while reducing computational complexity. Evaluated across multiple datasets-including a new motion-challenged benchmark-HBP-Net achieves competitive accuracy under static conditions and maintains performance as motion increases. This shift from signal reconstruction to probabilistic event detection offers a conceptually simpler and more resilient framework for rPPG. The method advances the feasibility of reliable, camera-based vital sign monitoring in real-world settings such as telehealth, fitness tracking, and continuous patient assessment.