Abstract
Given the recent COVID-19 pandemic, there has been a push in the medical community for reliable, remote medical care. The ubiquity of smartphone devices has brought about much interest in the estimation of patient vital signs via an audio or video signal. OBJECTIVE: In this paper, our objective is to estimate and compare respiratory rates from video, from audio, and jointly from video and audio for emergency department patients. METHODS AND PROCEDURES: For video, we use signal processing techniques, whereas for audio, we compare respiration rate estimates obtained using signal processing methods and learning-based methods due to the public availability of a large annotated audio corpus of breathing sounds. RESULTS: On our collected audio-video corpus, we achieve the best Mean Absolute Error (MAE) of 2.53 when using video features. For the publicly available respiratory rate corpus, we achieve MAE of 1.63 when using signal processing methods. CONCLUSION: Based on the experimental results from our clinical data, we draw the conclusion that the video modality yields more accurate estimates when compared to the audio modality. CLINICAL IMPACT: Accurate, contactless estimation of vital signs using video or audio is significant, because it can be performed remotely. Additionally, it is contactless and does not require extra measurement equipment.