Abstract
Accurate and scalable taxonomic classification is essential for biodiversity research, supporting systematic species identification across multiple hierarchical ranks. However, current image-based classification methods often fail to enforce taxonomic consistency, a critical limitation that undermines the reliability of their outputs for scientific use. Additionally, field-based biodiversity studies are constrained by limited computational resources and network availability on edge devices. To address these challenges, this paper proposes TaxonomyNet, an ensemble detection model with six independent heads for taxonomic classification, achieving high detection performance across all ranks (mAP: 90.7-99.75%) after training on a dataset of 50 Australian animal species. Furthermore, to resolve the core challenge of prediction inconsistency, we introduce the Weighted Agreement Loss (WAL) metric-a confidence-weighted disagreement measure designed to enforce structural coherence between predicted outputs and a reference taxonomy. Crucially, the application of this consistency-enforcing mechanism enhances hierarchical classification reliability, improving final species-level accuracy by up to 3.87% compared to baseline and recent published domain-specific foundation models, while also demonstrating superior computational efficiency, reducing delay by 22 minutes across 1500 samples and making it highly suitable for deployment on edge devices. This work provides a practical and extensible solution for reliable hierarchical classification in real-world biodiversity monitoring scenarios.