Abstract
Human milk glycan (HMG) metabolism, especially by bifidobacteria, is crucial for infant gut colonization and healthy microbiome development. Bifidobacterial species and even strains are highly variable in their ability and in their enzymatic repertoire for HMG metabolism. The enzymes involved in HMG metabolism often have many non-HMG-related homologues, necessitating fine-grained annotation for accurate assessment of bifidobacterial HMG metabolic capabilities. However, current annotation tools provide only broad glycoside hydrolase (GH) (sub)family classifications. Here, we present bifidoAnnotator, a tool for fine-grained annotation and visualization of bifidobacterial GH genes involved in HMG utilization. bifidoAnnotator leverages MMseqs2 (Many-against-Many sequence searching) to map protein sequences against a manually curated database of over 22,000 bifidobacterial GH proteins, organized into 13 families and 108 functional clusters, each assigned a validation status (i.e. experimentally validated, putative or hypothetical). The tool performs hierarchical annotation at family and cluster levels, identifying consistently annotated protein variants rather than just broad family assignments, and generates publication-ready heatmaps for comparative analysis. Benchmarking on a gold standard dataset demonstrated that bifidoAnnotator has superior performance (95.9% precision, 100% recall) compared with six established tools and is an order of magnitude faster than the most accurate competitor. bifidoAnnotator's superior performance and computational efficiency represent a meaningful advance in high-throughput genomic annotation workflows, enabling detailed characterization of strain-level functional diversity in bifidobacterial HMG metabolism.