Abstract
The carob tree (Ceratonia siliqua L.), an evergreen legume native to West Asia and long cultivated throughout the Mediterranean basin, is valued for its drought tolerance, nutritious pods, and ecological value. Despite its economic and environmental importance, genomic resources for this species have been limited. Here, we present a high-quality, chromosome-scale genome assembly of C. siliqua, generated using PacBio HiFi long-read and Hi-C sequencing technologies. The final assembly spans 501.39 Mb, organized into 12 pseudomolecules, with a scaffold N50 of 39.58 Mb. Genome annotation identified 30,295 protein-coding gene models, with 99.5% completeness according to conserved single-copy orthologs. Repetitive elements account for 52.2% of the genome, primarily long terminal repeat (LTR) retrotransposons of the Gypsy and Copia families. Comparative orthology analysis with 24 other plant genomes revealed conserved gene content and a substantial number of species-specific genes in C. siliqua. Demographic inference using the PSMC model indicated historical population size fluctuations, with convergence in effective population size between Cretan and Moroccan populations approximately 50,000 years ago. Notably, we investigated the potential for symbiotic nitrogen fixation, a trait ancestral to legumes. Genomic evidence suggests pseudogenization of key nodulation genes (NIN and RPG), consistent with ecological observations of the absence of root nodules. These results support the hypothesis of a secondary loss of nodulation in C. siliqua. This genome provides a valuable resource for evolutionary, ecological, and agricultural studies, particularly for understanding legume adaptation to Mediterranean climates and the molecular basis of symbiotic regression.