The Impact of Cervical Cytology Category Imbalance on Self-Supervised Representation Learning

宫颈细胞学类别失衡对自监督表征学习的影响

阅读:1

Abstract

The method of pre-training on extensive unlabeled data, followed by transferring the learned representations to downstream tasks with limited labeled data, has been effective in various fields. However, this paradigm faces the challenge of extreme data imbalance in cervical cytology, with positive cells in whole-slide images constituting approximately 1%. In this paper, we propose a pipeline for investigating the impact of this extreme category imbalance on self-supervised representation learning (SSRL). The pipeline consists of 2 stages: SSRL and downstream tasks. In the SSRL stage, we employ 2 well-established methods, masked autoencoders and the simple framework for contrastive learning, across 9 datasets with varying degrees of imbalance. The pre-trained representations are then transferred to downstream tasks by employing both linear probing and fine-tuning techniques. Additionally, we examine the effect of SSRL on annotation efficiency by varying the quantities of annotation (annotation budget). Our investigation leverages a total of 168,000 image tiles derived from 1,320 whole-slide images obtained from multiple centers. Our findings indicate a noticeable decline in accuracy (Acc) within downstream tasks as data balance shifts from 1:1 to 1:100, with a maximum drop of about 4%. This highlights the substantial impact of data imbalance on SSRL, particularly evident in downstream tasks with lower annotation rates, such as at a 1% budget. Furthermore, the downstream tasks demonstrate the potential to achieve accuracy comparable to those of scenarios with a high annotation budget (50%), even when utilizing a limited annotation budget (5%). The code is available at https://github.com/LGBluesky/ICISSRL.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。