Abstract
BACKGROUND: A comprehensive and representative reference database is crucial for accurate taxonomic and functional profiling of the human gut microbiome in population-level studies. However, as approximately 70% of current microbial reference data originate from European and North American populations, other regions, including East Asia-and particularly China-remain significantly underrepresented. METHODS: We constructed the human Gut Microbiome Reference (GMR), comprising 478,588 high-quality microbial genomes from Chinese (247,134) and non-Chinese (231,454) populations. Species-level clustering and protein annotations were performed to characterize microbial diversity and function. We further integrated novel microbial genomes into taxonomic profile database and validated the improvements using independent cohort data. RESULTS: The GMR dataset spans 6664 species, including 26.4% newly classified species, and encodes over 20 million unique proteins, with 47% lacking known functional annotations. Notably, we observed that 35.35 and 32.46% of species unique to Chinese and non-Chinese populations, respectively. For 2145 species shared between populations, 74% of 304 species with balanced prevalence between populations exhibited population-specific phylogenetic stratification, involving health relevant functionalities such as antibiotic resistance. Integration of novel genomes into taxonomic improved population-level species profiling by up to 23% and uncovered replicable associations between novel species and host physiological traits. CONCLUSIONS: Our study largely expands the compositional and functional landscape of the human gut microbiome, providing a crucial resource for studying the role of gut microbiome for regional health disparities.