Abstract
The increasing number of Barcode of Life Database (BOLD) records per species and genus leads to contradictory species assignments within Barcode Index Numbers (BINs), serving as identifiers for the BOLD ID engine. To examine these issues, we analyzed a dataset comprising original and curated BOLD records for the genus Tachina (Insecta: Tachinidae), based on a previous publication. This dataset included both published and private records. We were able to assess the performance of the BOLD engine's species determination algorithm, Refined Single Linkage (RESL), and compare it to Assemble Species by Automatic Partitioning (ASAP). Additionally, we investigated the usage of BINs by the BOLD v4 ID engine. Our analysis confirmed that BOLD queries primarily rely on BINs for species identification, although some cases deviated from this pattern, resulting in species matches inconsistent with the assigned BIN species. ASAP was found to be superior to RESL due to RESL's adherence to the concept of the DNA barcoding gap. Moreover, we found that taxonomic misassignments, inconsistencies in BIN formation, and missing metadata also contribute significantly to unreliable identifications. These problems appear to stem from both algorithmic limitations and deficiencies in submission and post-submission processes. Moreover, we noted that the default mode of the BOLD v4 ID engine integrates both private and published data, leading to public records based solely on COI-based identifications. However, this issue may now be mitigated, as the BOLD v5 ID engine default mode exclusively employs published data. To enhance BOLD's reliability, we propose improvements to submission and post-submission processes. Without such amendments, the accumulation of contradictory species assignments within BINs will continue to rise and the reliability of specimen identification by BOLD will decrease.