PSIV-22 Genome-wide association studies for binary traits: A comparison of methods

PSIV-22 二元性状的全基因组关联研究:方法比较

阅读:1

Abstract

Genome-wide association studies (GWAS) are widely used in animal breeding and genetics research to identify candidate genes and genomic regions associated with traits of interest. Many methodologies and software tools have been developed to perform GWAS. However, binary traits present unique challenges in GWAS due to their categorical nature, requiring specialized approaches for accurate analysis. The POSTGSF90 program from the BLUPF90 family can estimate SNP effects using the single-step GBLUP framework. For binary traits analyzed under threshold models, the window-based variance approach is an option, but it lacks the statistical power to statistically detect significance, and threshold selection remains subjective. An alternative is to use BLUPF90+ to obtain p-values before running POSTGSF90. Another method suited for binary traits is fastGWA-GLMM, a computationally efficient tool for generalized linear mixed model (GLMM)-based GWAS, implemented in the GCTA software. This study aimed to compare GWAS results between POSTGSF90 and GCTA using the same dataset. Phenotypic records included 182,964 observations for a binary morphological defect (pigmentation; affected/unaffected) in Nellore cattle, with an incidence of 6.8%. Quality control was performed using preGSF90, starting with 24,729 genotyped animals and 588,846 SNPs. Markers with a minor allele frequency (MAF ≤ 0.05), call rate ≤ 0.90, extreme deviations from Hardy-Weinberg equilibrium (p ≤ 10⁻⁵), unknown or duplicated positions, and Mendelian conflicts were removed, resulting in a final dataset of 24,562 genotyped animals and 583,769 SNP markers. Contemporary groups were included as fixed effects in both analyses. For POSTGSF90, the analysis incorporated the full dataset, including 340,991 animals with pedigree information, requiring 4 days, 8 hours, 46 minutes, and 53 seconds to complete. In contrast, GCTA used only animals with both genotypic and phenotypic data (8,920 records), completing the analysis in 26 minutes and 52 seconds. Both POSTGSF90 and GCTA identified the same five significant chromosomes for the binary trait. In the GWAS performed with POSTGSF90, a total of 1,365 markers were identified as significant (p-value < 0.05 after Bonferroni correction), and 466 genes were found, of which 318 were protein-coding genes. In GCTA, 1,165 markers were identified as significant (p-value < 0.05 after Bonferroni correction), and 363 genes were found, of which 243 were protein-coding genes. A total of 342 genes (~74%) overlapped between both analyses. These results demonstrate that GCTA achieved comparable detection power while using fewer records and significantly less computational time. Additional analyses will be performed to better understand the importance of the genomic regions identified solely by one software.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。