Abstract
This study evaluates the reliability of using automatically segmented audio events to detect pharyngeal swallowing activity by comparing their onset and offset timings with physiological event markers derived from videofluoroscopic swallowing studies (VFSS). Swallowing sounds were recorded from 45 patients with suspected dysphagia during VFSS using a neck-worn electronic stethoscope (NWES). A segmentation algorithm was used to detect the audio onset and offset of swallowing. These were compared to VFSS-annotated events: bolus contact with the epiglottis (P-Start), upper esophageal sphincter (UES) opening (E-Start), and UES closure (E-End). Timing offsets and duration measures were analysed, and subgroup comparisons were performed based on the presence or absence of oral containment. The algorithm detected 80 of 84 swallows. Audio onset occurred after P-Start in 96% of cases and after E-Start in 67.5%. Audio offset occurred after E-End in 82.5% of swallows. The mean audio-derived pharyngeal clearance time (PCT) was 706.5 ± 294.8 ms, closely aligned with the VFSS-based pharyngeal duration of 790.0 ± 310.0 ms. PCT did not differ significantly between swallows with and without oral containment, suggesting robustness against pre-pharyngeal activities. The findings confirm that audio-based segmentation reliably captures the pharyngeal phase and estimates durations that align with VFSS-derived physiological events, supporting its use in non-invasive, bedside screening for swallowing efficiency.