Abstract
Modified Barium Swallow (MBS) exams, performed using video-fluoroscopy, an X-ray imaging technique, are essential for assessing swallowing function. They visualize the barium bolus (contrast agent) during the swallowing process in the head and neck area, thereby providing crucial insights into the dynamics of swallowing. Typically, these exams include both diagnostic anteroposterior (AP) and lateral planes, in addition to non-diagnostic "scout" films. This study introduces a deep learning solution aimed at streamlining the pre-analysis process of MBS exams by automating the identification of video orientations and scout video clips. Our methods are trained and tested on a comprehensive dataset comprising 2,315 video clips from 172 MBS exams and 106 patients. To distinguish AP videos from lateral views, our model achieved more than 99% accuracy at the frame level and 100% at the video level. In differentiating scout from bolus swallowing tasks, the model attained a maximum accuracy of 86% at the video level. We further merged these two tasks into a multi-task learning approach further enhanced the accuracy to 91% for scout/bolus differentiation. These advancements allow clinicians to allocate more efforts to focus primarily on lateral view videos for clinically relevant measurements such as the Penetration-Aspiration Scale (PAS) and Dynamic Imaging Grade of Swallowing Toxicity (DIGEST). This image sorting is also a pre-requisite step necessary to apply deep learning solutions to full image analysis.