Abstract
Dysphagia manifests a wide spectrum of effects on human health, ranging from mild symptoms like coughing to more severe issues such as airway obstruction or pneumonia. Typically, these symptoms are caused by the penetration of chewed food or liquid, referred to as the bolus, into the airway. Traditionally used manual frame-by-frame analysis of videofluoroscopy images for bolus localization is time-consuming, and previously, researchers aimed to create an automated algorithm for bolus segmentation. To do so, researchers relied mainly on a supervised learning approach. However, our study takes a different path by constructing a reliable self-supervised learning network aiming to enhance the performance of the bolus segmentation task. The proposed method involves utilizing the contrastive random walk model for the pretext task, alongside the U-Net + + model for the downstream task. Specifically, we employ the ResNet-18 as the backbone network, which enables us to leverage weights from the pretext task as an initialization for the downstream task. The results from our study demonstrated that the constructed self-supervised network was able to outperform the supervised learning approach. Additionally, we showed that using a new self-supervised learning weighted ensemble model strategy increased the U-Net + + model's F1-score from 79.1 to 81.8% compared to single ImageNet initialization. In conclusion, the proposed approach represents a pioneering stride in the deployment and exploration of self-supervised learning within videofluoroscopy datasets. Additionally, from what we know, we proposed the first automatic method for efficiency swallowing assessment in this paper.