A Computer Vision Method for Finding Mislabelled Specimens Within Natural History Collections

一种用于查找自然历史收藏中标签错误的标本的计算机视觉方法

阅读:1

Abstract

Natural history collections are essential for biodiversity and evolution research and for studying biotic responses to global change. However, the numbers of specimens within natural history collections pose management challenges. Reduced funds, declining taxonomic training and expanding collections can lead to mislabelled or missing specimens. This highlights the need for innovative and non-destructive methods of taxonomic verification for specimens in large collections. While genetic analyses offer precise verification, they are resource-intensive and less effective on degraded DNA from older specimens, with risks of damage to smaller specimens. Computer vision can automate tasks such as species-level verification and morphological examination, though these techniques have yet to be incorporated and utilised by natural history collections for such management tasks. Digitisation initiatives, such as those at the Natural History Museum (NHM), London, have gained momentum in recent years, converting specimens to digital formats and enhancing global accessibility. Here, we describe a computer vision pipeline applied to the digitised British and Irish Lepidoptera collection at the NHM. Specifically, our pipeline identifies specimens that do not match their labelled species status. The pipeline was executed for 100 runs for the Butterfly and Moth datasets, resulting in 99,350 out of 350,208 specimens (28.37%) being flagged at least once. We attribute a portion of these as pipeline errors, given the likelihood of some mislabelled specimens within training datasets. However, specimens flagged consistently across > 80% of pipeline runs are likely mislabelled within the collections. Taxonomic experts visually examined 210 such specimens, finding 145 to be incorrectly labelled in the collection or the NHM data portal. Additionally, 30 specimens were sent for genetic verification to confirm species-level identification. This synergy of computer vision and genetic-based species identification enhances the accuracy and efficiency of managing natural history collections, preserving their value for future generations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。