Abstract
Recent analyses have revealed that orangutan alpha satellite higher-order repeat (HOR) arrays in complete centromeres are composed of three to four distinct HOR blocks, each sharing only 80-90% sequence identity, thus forming a patchwork-quilt pattern of independent HOR expansions. In contrast, using our novel HOR-detection algorithm GRhor, we analyzed the complete Y chromosome centromere in orangutan and identified a highly ordered and complex alpha satellite 58mer superHOR array, comprising 67 HOR copies, including 46 highly identical canonical copies with a remarkably low divergence of only 0.25%. Given that the largest known human alpha satellite HOR is the 34mer on the Y chromosome, this novel 58mer structure qualifies as a superHOR. The canonical 58mer HOR contains only 44 distinct monomer types, with 14 types repeated within the unit, resulting in a unique five-row cascading organization. Such complexity is not detectable using standard HOR-searching tools employed in previous studies. Additionally, we identified a second, less pronounced 45mer cascading superHOR array with 0.81% divergence. For comparative purposes, we also detected a cascading 18mer HOR in gorilla and a Willard-type 28mer HOR in chimpanzee Y centromeres. Notably, preliminary genome-wide analysis in orangutan reveals other superHORs, including 84mer and 53mer arrays in chromosome 5; a 54mer in chromosome 10; a 51mer in chromosome 14; a 53mer in chromosome 15; and a 45mer in chromosome 22. These findings underscore the power of GRMhor in revealing highly structured and species-specific HOR architectures, with potential implications for centromere evolution and primate comparative genomics.