Boundary-associated propagation of a processed pseudogene dissects pre-existing limitations of genome annotation in the T2T era

边界相关的加工假基因传播揭示了T2T时代基因组注释的既有局限性。

阅读:2

Abstract

BACKGROUND: Processed pseudogenes and retrogenes are defined by their RNA-mediated origin and, by virtue of this origin-based definition, are often interpreted as discrete genomic insertions. The completion of telomere-to-telomere (T2T) reference assemblies has substantially improved the resolution of segmental duplication architectures and centromeric satellite sequences that were previously inaccessible, allowing genomic structural contexts that were effectively invisible in earlier references to be directly examined. RESULTS: Using the SEPTIN14P-CICP locus family as a case study, chain-based comparative analyses showed that a genomic window spanning the SEPTIN14 3′ terminal exon and the adjacent processed pseudogene CICP12 is dispersed into multiple segmental duplication-associated units across great apes, rather than being maintained as a single orthologous locus. Genome-wide analyses further indicated that annotated CICP loci preferentially localize within segmental duplication blocks and accumulate near pericentromeric or subtelomeric regions. Despite this duplication-associated dispersion, codon-based selection analyses revealed pervasive purifying selection acting on the full-length SEPTIN14 coding sequence and its 3′ terminal exon, arguing against a model in which the terminal exon was newly formed through segmental duplication. Together, these results show that when highly conserved, strongly constrained coding regions are embedded within segmental duplication-rich regions, co-dispersed processed pseudogene copies can be interpreted as distinct from independently generated LINE-1-mediated insertions and as reflecting secondary structural propagation. CONCLUSIONS: When considered in light of origin-based definitions of processed pseudogenes and retrogenes, and specifically within duplication-rich and structurally unstable genomic regions resolved by T2T-level assemblies, these results suggest that multiple annotated loci can arise through secondary propagation of a single RNA-derived insertion. Under such contexts, incorporation of selective constraint and cross-species conservation enables more reliable distinction between source insertions and their secondarily propagated copies. This case study highlights a limitation of current annotation frameworks and demonstrates the need for more precise annotation that incorporates evolutionary and structural context in the T2T era. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13100-026-00394-z.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。