Abstract
Copy number variations (CNVs), including duplications and deletions of the genome ranging up to 1 Mb, are an important contributor to genomic variation, and may influence phenotypic variation. They are relatively understudied compared with single nucleotide polymorphisms despite affecting a higher proportion of the genome. Using whole genome sequencing data and RNA-sequencing data, we identified and characterized the natural diversity of CNVs across the native range of Populus trichocarpa and the effects of CNVs on gene expression. We analyzed whole genome sequencing data of 751 P. trichocarpa individuals to identify CNVs, analyzed their size, distribution and population structure. We also examined gene expression with RNA-sequencing data of leaf and xylem tissues for 390 individuals. We found 11,501 duplications and 22,839 deletions covering a major percentage of the genome. Genes overlapping with CNVs were enriched in important biological processes such as reproduction, cellulose production, and defense. Analysis of CNV genotypes with expression data showed that a minority of genes overlapping CNVs have a strong correlation of expression level with copy number. Those genes were significantly enriched in stress-related responses. Our identified CNVs provide insights into the extent, characteristics, and diversity of CNVs in wild populations of P. trichocarpa and the effects of CNVs on gene expression.