A complete and near-perfect rhesus macaque reference genome: lessons from subtelomeric repeats and sequencing bias

完整且近乎完美的恒河猴参考基因组:来自端粒下重复序列和测序偏差的启示

阅读:2

Abstract

A truly complete, telomere-to-telomere (T2T), and error-free reference genome remains a foundational resource-and long-standing goal-for unbiased comparative and functional genomics. While recent T2T assemblies of humans and other primates have made substantial progress, most still contain thousands of base-level errors, particularly within highly repetitive regions. Here, we present T2T-MMU8v2.0, a near-perfect T2T assembly of the rhesus macaque (Macaca mulatta), representing the highest base-level accuracy reported in a primate genome to date. By employing an optimized ONT-only assembly strategy, we identify subtelomeric satellite-rich regions as the principal bottleneck to improving assembly quality, owing to technological biases in long-read platforms and limitations in current hybrid assembly frameworks. We discover 268 previously unannotated repeat families and resolve ~8 Mbp of SATR satellite arrays, with over 99-fold enrichment in historically misassembled subtelomeric regions. These satellites form four distinct genomic architectures, each with unique SATR satellite composition, segmental duplication organization, and epigenetic signatures, distinct from the subtelomeric architectures observed in hominid genomes. Notably, in contrast to the largely gene-poor subtelomeric regions in African hominids, the SATR architectures in macaques harbor 58 actively transcribed genes, supported by open chromatin and expression data, suggesting gene innovation within these repetitive regions. Functionally, T2T-MMU8v2.0 improves read mappability and accuracy across sequencing platforms, and results in a 19% improvement of transcription start site enrichment scores and 5,821 additional chromatin accessibility peaks on average, thereby enhancing variant detection, regulatory annotation, and transcriptomic resolution in population genetics or single-nucleus studies. Together, this work establishes a new benchmark for genomics, offers a roadmap for resolving complex repetitive regions, and reveals previously unrecognized features of subtelomeric genome structure and evolution.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。