Abstract
Given the rapid mutation and high transmissibility of coronaviruses, especially SARS-CoV-2, comparative genomic studies are crucial for understanding viral evolution, transmission dynamics, and therapeutic development. In prior work, we analyzed and compared the spectral distribution patterns of various k-mer subsets across 920 genome sequences, spanning from primates to prokaryotes. This revealed an evolutionary mechanism in genome sequences, indicating the presence of both CG and TA-specific selection modes. In the present study, we further investigate the specific selection modes in coronavirus genomic sequences by examining the intrinsic distribution rules of 32 XYi 6-mer subset spectra. Our results show that coronavirus genomes exhibit only the CG-specific selection mode, with no evidence of TA-specific selection. Using the CG-specific selection mode, we identified CG1 6-mers as the fundamental subset underlying coronavirus genome evolution. To validate the CG1 subset, we constructed phylogenetic relationships for a set of coronaviruses and SARS-CoV-2 variant genomes. Comparative analysis confirmed that the resulting phylogenetic relationships align more closely with established knowledge. This study thus provides a theoretical framework for inferring phylogenetic relationships at the whole-genome level.