Abstract
Camellias constitute one of the major woody oil sources worldwide. Oil camellias are mostly polyploid, prompting technical challenges for genome assembling, especially at haplotype-resolved scale. Here a haplotype-resolved chromosome-level genome assembly was generated for an important autohexaploid oil camellia cultivar Camellia osmantha 'Yidan' based on PacBio high-fidelity long reads and Hi-C data. The genome assembly size was 14.38 Gb, of which 11.08 Gb (77.05%) were anchored onto 90 chromosome-level pseudomolecules representing six allele-aware haplotypes. The largest haplotype spanned 2.73 Gb in genome size, and its high quality was evidenced by contig N50 of 1.69 Mb, scaffold N50 of 166.34 Mb, completeness score of benchmarking universal single-copy orthologs (BUSCOs) of 95.79%, and long terminal repeat assembly index of 14.19. Totally 60,212 protein-coding genes were predicted, with 3,269 transcription factors, 2,655 resistance gene analogues, 80 oil biosynthesis-related genes, and 497 flowering-related genes included. Our data will serve as a valuable resource for unveiling the genetic basis of economic trait variation and facilitating the breeding applications in oil camellias.