Abstract
Crataegus spp. plants are valuable horticultural crops because of their extensive use in Chinese herbal medications, cosmetics, food production, and other industries. However, the wide variety of species, similar morphological characteristics, inherent hybridization, apomixis, and polyploidy have led to confusion in terms of their taxonomic status. Herein, a total of 18 complete chloroplast genomes including 17 Crataegus species and 1 Mespilus species were newly sequenced and comprehensively analyzed for comparative genomics and phylogenetic relationships. The 18 chloroplast genomes possessed typical quadripartite structures with lengths from 159,638 to 159,973 bp in size. These chloroplast genomes encode 119-131 genes, including 37 transfer RNA (rRNA) genes, 8 ribosomal RNA (tRNA) genes, and 74-85 protein-coding genes (PCGs). In addition, 23-54 long repeat sequences and 74-87 simple sequence repeats (SSRs) were detected. The examination of Ka/Ks ratios for 18 chloroplast genomes revealed that the rpoC2 gene was significantly positively selected. Additionally, we identified nine distinct hotspot regions (infA, ndhC, pasl, rps19, ndhC~trnV-UAC, psbZ~trnG-UCC, rpl33~rps18, trnH-GUG~psbA, and trnR-UCU~atpA), and verified that ndhC~trnV-UAC might be used as a foundation for subsequent molecular marker studies aimed at identifying Crataegus species. Maximum likelihood and Bayesian phylogenetic trees using chloroplast genome sequences consistently revealed genetic relationships among Crataegus and Mespilus species, and confirmed the taxonomic status of Crataegus accessions (GSSZ, JRY, RR2H, RR3H, ZWSZ). The results of divergence time showed that the crown age of C. subg. Crataegus was about 33.487 Ma, and then started to diverge into the C. subg. Americanae and C. subg. Sanguineae around 27.059 Ma. Based on the results of molecular evidence, we speculate that genus Crataegus originated earliest from European-derived species within C. subg. Crataegus. Biogeographic and molecular dating analyses suggested that China represented a putative maternal origin of Crataegus species. The complete chloroplast genomes of Crataegus not only enable the resolution of phylogenetic relationships within the genus but also offer novel insights into chloroplast genome structure variation and evolution. Additionally, the identified divergent DNA regions hold significant utility for species identification and phylogenetic reconstruction in Crataegus.
