Abstract
Biodiversity loss necessitates improved monitoring of small, species-rich taxa, such as protists, phyto- and zooplankton and terrestrial invertebrates. Traditional biomonitoring is often infeasible for these taxa due to complex morphology and few taxonomists. DNA-based approaches offer promising solutions by enabling rapid species identification. However, the effectiveness of these methods depends on the completeness of molecular reference databases, which remain incomplete, particularly for remote and biodiverse regions. To address this, we propose the StrataSeq workflow, a systematic approach to optimise the generation of DNA reference databases for hard-to-identify taxa. Reference sequences allow us to connect molecular operational taxonomic units to a wealth of information available for many described taxa. StrataSeq consists of four key steps: (1) Habitat-stratified sample subsetting selects a minimal but ecologically representative sample set by stratifying along key environmental gradients. (2) Prioritising morphospecies involves sorting specimens into morphospecies and ranking them based on their occurrence across samples, prioritising common taxa for detailed identification. (3) Detailed morphological identification focuses on common morphospecies to maximise taxonomic coverage while minimising effort. (4) Reference DNA sequence generation targets taxa lacking molecular references, with sequenced specimens deposited as museum vouchers. We benchmarked the StrataSeq workflow using two datasets of Collembola from grassland soils in Germany. In comparison with a species list generated by a more labour-intensive traditional approach (identification of randomly selected individuals from all samples), the StrataSeq workflow captured 69% of species but required only 22% of the effort. StrataSeq is adaptable to various organism groups and environmental settings, including both spatial and temporal gradients. The workflow enhances the cost-effectiveness of generating reference DNA databases, supporting improved biodiversity monitoring and ecological research. StrataSeq offers a scalable solution to accelerate the completion of molecular databases, thereby improving biomonitoring and ecosystem assessments under global change pressures.