Precise Identification of Higher-Order Repeats (HORs) in T2T-CHM13 Assembly of Human Chromosome 21-Novel 52mer HOR and Failures of Hg38 Assembly

精确鉴定人类21号染色体T2T-CHM13组装中的高阶重复序列(HOR)——新型52mer HOR及Hg38组装失败

阅读:2

Abstract

BACKGROUND: Centromeric alpha satellite DNA is organized into higher-order repeats (HORs), whose precise structure is often difficult to resolve in standard genome assemblies. The recent telomere-to-telomere (T2T) assembly of the human genome enables complete analysis of centromeric regions, including the full structure of HOR arrays. METHODS: We applied the novel high-precision GRMhor algorithm to the complete T2T-CHM13 assembly of human chromosome 21. GRMhor integrates global repeat map (GRM) and monomer distance (MD) diagrams to accurately identify, classify, and visualize HORs and their subfragments. RESULTS: The analysis revealed a novel Cascading 11mer HOR array, in which each canonical HOR copy comprises 11 monomers belonging to 10 different monomer types. Subfragments with periodicities of 4, 7, 9, and 20 were identified within the array. A second, complex 23/25mer HOR array of mixed Willard's/Cascading type was also detected. In contrast to the hg38 assembly, where a dominant 8mer and 33mer HOR were previously annotated, these structures were absent in the T2T-CHM13 assembly, highlighting the limitations of hg38. Notably, we discovered a novel 52mer HOR-the longest alpha satellite HOR unit reported in the human genome to date. Several subfragment repeats correspond to alphoid subfamilies previously identified using restriction enzyme digestion, but are here resolved with higher structural precision. CONCLUSIONS: Our findings demonstrate the power of GRMhor in resolving complex and previously undetected alpha satellite architectures, including the longest canonical HOR unit identified in the human genome. The precise delineation of superHORs, Cascading structures, and HOR subfragments provides unprecedented insight into the fine-scale organization of the centromeric region of chromosome 21. These results highlight both the inadequacy of earlier assemblies, such as hg38, and the critical importance of complete telomere-to-telomere assemblies for accurately characterizing centromeric DNA.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。