Abstract
BACKGROUND: The rapid expansion of human genomic data demands foundation models that manage ultra-long sequences and capture population diversity, limitations common in existing models that lack human-specific representation, and clinical inference efficiency. RESULTS: Here, we introduce Genos (Genos-1.2B/Genos-10B), a human-centric genomic foundation model engineered for million-basepair sequence modeling. Genos utilizes a large-scale mixture of experts structure, optimized for a 1-Mb context, trained on high-quality human de novo assemblies from datasets such as the Human Pangenome Reference Consortium and the Human Genome Structural Variation Consortium, representing diverse global populations. A suite of optimization strategies was implemented to ensure training stability and enhance computational efficiency, which collectively reduces costs and facilitates million-basepair context modeling. Functionally, Genos performs single-nucleotide resolution analysis and dynamically simulates the cascade effects of noncoding variations on RNA expression profiles. In comprehensive evaluations, Genos uniformly surpasses state-of-the-art models on critical human genomics benchmarks and demonstrates robust omics-text cross-modal diagnostic capabilities. We present a systematic technical evaluation and validation of Genos's architecture, training convergence, and performance across standard benchmarks. CONCLUSIONS: This work provides a reliable technical blueprint and performance benchmark for the development of the next generation of high-efficiency genomic foundation models. Genos model weights, inference code, and usage documentation are publicly available on GitHub (https://github.com/BGI-HangzhouAI/Genos) and Hugging Face Hub (https://huggingface.co/BGI-HangzhouAI).