Abstract
BACKGROUND: Processed pseudogenes and retrogenes are defined by their RNA-mediated origin and, by virtue of this origin-based definition, are often interpreted as discrete genomic insertions. The completion of telomere-to-telomere (T2T) reference assemblies has substantially improved the resolution of segmental duplication architectures and centromeric satellite sequences that were previously inaccessible, allowing genomic structural contexts that were effectively invisible in earlier references to be directly examined. RESULTS: Using the SEPTIN14P-CICP locus family as a case study, chain-based comparative analyses showed that a genomic window spanning the SEPTIN14 3′ terminal exon and the adjacent processed pseudogene CICP12 is dispersed into multiple segmental duplication-associated units across great apes, rather than being maintained as a single orthologous locus. Genome-wide analyses further indicated that annotated CICP loci preferentially localize within segmental duplication blocks and accumulate near pericentromeric or subtelomeric regions. Despite this duplication-associated dispersion, codon-based selection analyses revealed pervasive purifying selection acting on the full-length SEPTIN14 coding sequence and its 3′ terminal exon, arguing against a model in which the terminal exon was newly formed through segmental duplication. Together, these results show that when highly conserved, strongly constrained coding regions are embedded within segmental duplication-rich regions, co-dispersed processed pseudogene copies can be interpreted as distinct from independently generated LINE-1-mediated insertions and as reflecting secondary structural propagation. CONCLUSIONS: When considered in light of origin-based definitions of processed pseudogenes and retrogenes, and specifically within duplication-rich and structurally unstable genomic regions resolved by T2T-level assemblies, these results suggest that multiple annotated loci can arise through secondary propagation of a single RNA-derived insertion. Under such contexts, incorporation of selective constraint and cross-species conservation enables more reliable distinction between source insertions and their secondarily propagated copies. This case study highlights a limitation of current annotation frameworks and demonstrates the need for more precise annotation that incorporates evolutionary and structural context in the T2T era. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13100-026-00394-z.