Abstract
Natural history collections are essential for biodiversity and evolution research and for studying biotic responses to global change. However, the numbers of specimens within natural history collections pose management challenges. Reduced funds, declining taxonomic training and expanding collections can lead to mislabelled or missing specimens. This highlights the need for innovative and non-destructive methods of taxonomic verification for specimens in large collections. While genetic analyses offer precise verification, they are resource-intensive and less effective on degraded DNA from older specimens, with risks of damage to smaller specimens. Computer vision can automate tasks such as species-level verification and morphological examination, though these techniques have yet to be incorporated and utilised by natural history collections for such management tasks. Digitisation initiatives, such as those at the Natural History Museum (NHM), London, have gained momentum in recent years, converting specimens to digital formats and enhancing global accessibility. Here, we describe a computer vision pipeline applied to the digitised British and Irish Lepidoptera collection at the NHM. Specifically, our pipeline identifies specimens that do not match their labelled species status. The pipeline was executed for 100 runs for the Butterfly and Moth datasets, resulting in 99,350 out of 350,208 specimens (28.37%) being flagged at least once. We attribute a portion of these as pipeline errors, given the likelihood of some mislabelled specimens within training datasets. However, specimens flagged consistently across > 80% of pipeline runs are likely mislabelled within the collections. Taxonomic experts visually examined 210 such specimens, finding 145 to be incorrectly labelled in the collection or the NHM data portal. Additionally, 30 specimens were sent for genetic verification to confirm species-level identification. This synergy of computer vision and genetic-based species identification enhances the accuracy and efficiency of managing natural history collections, preserving their value for future generations.