Bridging GWAS to genes: an integrative multi-omics approach using cattle data

将全基因组关联分析与基因联系起来:一种利用牛数据的整合多组学方法

阅读:2

Abstract

BACKGROUND: Genome-wide association studies (GWASs) have identified thousands of loci for complex traits, but pinpointing causal variants and linking them to target genes remains challenging. Several strategies have been proposed to address these challenges, e.g., comparisons across the genome, using larger and multi-breed datasets, multi-trait analyses, leveraging multi-omics data, etc. RESULTS: We used a multi-breed dataset of over 81,000 cows from Australia, including Holstein, Jersey, and Australian Red, with phenotypes for milk lactose percentage (LP) and imputed sequence genotypes. LD pruning excluded SNPs with r2 > 0.95. We used BayesR to estimate SNP effects for LP (~ 1.1 million SNPs remained after LD pruning); These SNP effects were used to predict local genomic breeding values (GEBVs) for ~ 400 mammary RNA-sequenced cows from New Zealand. Then, genetic score omics regression (GSOR) was applied to test associations between observed gene expression and local GEBVs, identifying 711 significant genes (FDR ≤ 0.1) out of 12,000 genes expressed in the mammary gland. We developed a window-based test to investigate the significance of colocalization between GSOR results and GWAS summary statistics obtained from an independent study. We found 30 windows containing both GWAS signals and GSOR-significant genes (i.e., 34 genes); this overlap was significantly higher than chance expectation (P(Fisher) = 2.96 × 10⁻⁹). Among the 34 genes analyzed, 20 contributed to the significantly enriched gene ontology term 'transmembrane transport' and its child terms (FDR < 0.05). These terms are relevant to the physiology of lactose production in the mammary gland. CONCLUSIONS: We hypothesized that the 20 genes are the most likely causal genes for the trait because: mammary expression of these genes was associated with GEBV for the trait, they were significantly colocalized with GWAS signals, and they were enriched in gene ontology terms relevant to physiology of the trait. Our approach provides strong evidence for causal genes supported by multiple lines of evidence (GWAS, GSOR, and functional enrichment) and demonstrates the power of multi-omics data integration.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。