Reducing annotation effort in agricultural data: simple and fast unsupervised coreset selection with DINOv2 and K-means

减少农业数据标注工作量:基于DINOv2和K均值的简单快速的无监督核心集选择

阅读:1

Abstract

The need for large amounts of annotated data is a major obstacle to adopting deep learning in agricultural applications, where annotation is typically time-consuming and requires expert knowledge. To address this issue, methods have been developed to select data for manual annotation that represents the existing variability in the dataset, thereby avoiding redundant information. Coreset selection methods aim to choose a small subset of data samples that best represents the entire dataset. These methods can therefore be used to select a reduced set of samples for annotation, optimizing the training of a deep learning model for the best possible performance. In this work, we propose a simple yet effective coreset selection method that combines the recent foundation model DINOv2 as a powerful feature selector with the well-known K-Means clustering method. Samples are selected from each calculated cluster to form the final coreset. The proposed method is validated by comparing the performance metrics of a multiclass classification model trained on datasets reduced randomly and using the proposed method. This validation is conducted on two different datasets, and in both cases, the proposed method achieves better results, with improvements of up to 0.15 in the F1 score for significant reductions in the training datasets. Additionally, the importance of using DINOv2 as a feature extractor to achieve these good results is studied.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。