Using Masked Image Modelling Transformer Architecture for Laparoscopic Surgical Tool Classification and Localization

基于掩码图像建模Transformer架构的腹腔镜手术器械分类与定位

阅读:1

Abstract

Artificial intelligence (AI) has shown its potential to advance applications in various medical fields. One such area involves developing integrated AI-based systems to assist in laparoscopic surgery. Surgical tool detection and phase recognition are key components to develop such systems, and therefore, they have been extensively studied in recent years. Despite significant advancements in this field, previous image-based methods still face many challenges that limit their performance due to complex surgical scenes and limited annotated data. This study proposes a novel deep learning approach for classifying and localizing surgical tools in laparoscopic surgeries. The proposed approach uses a self-supervised learning algorithm for surgical tool classification followed by a weakly supervised algorithm for surgical tool localization, eliminating the need for explicit localization annotation. In particular, we leverage the Bidirectional Encoder Representation from Image Transformers (BEiT) model for tool classification and then utilize the heat maps generated from the multi-headed attention layers in the BEiT model for the localizing of these tools. Furthermore, the model incorporates class weights to address the class imbalance issue resulting from different usage frequencies of surgical tools in surgeries. Evaluated on the Cholec80 benchmark dataset, the proposed approach demonstrated high performance in surgical tool classification, surpassing previous works that utilize both spatial and temporal information. Additionally, the proposed weakly supervised learning approach achieved state-of-the-art results for the localization task.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。