Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure

利用主成分分析和空间分析进行祖源推断:一种基于距离的分析方法来解释人口亚结构

阅读:1

Abstract

BACKGROUND: Accurate inference of genetic ancestry is of fundamental interest to many biomedical, forensic, and anthropological research areas. Genetic ancestry memberships may relate to genetic disease risks. In a genome association study, failing to account for differences in genetic ancestry between cases and controls may also lead to false-positive results. Although a number of strategies for inferring and taking into account the confounding effects of genetic ancestry are available, applying them to large studies (tens thousands samples) is challenging. The goal of this study is to develop an approach for inferring genetic ancestry of samples with unknown ancestry among closely related populations and to provide accurate estimates of ancestry for application to large-scale studies. METHODS: In this study we developed a novel distance-based approach, Ancestry Inference using Principal component analysis and Spatial analysis (AIPS) that incorporates an Inverse Distance Weighted (IDW) interpolation method from spatial analysis to assign individuals to population memberships. RESULTS: We demonstrate the benefits of AIPS in analyzing population substructure, specifically related to the four most commonly used tools EIGENSTRAT, STRUCTURE, fastSTRUCTURE, and ADMIXTURE using genotype data from various intra-European panels and European-Americans. While the aforementioned commonly used tools performed poorly in inferring ancestry from a large number of subpopulations, AIPS accurately distinguished variations between and within subpopulations. CONCLUSIONS: Our results show that AIPS can be applied to large-scale data sets to discriminate the modest variability among intra-continental populations as well as for characterizing inter-continental variation. The method we developed will protect against spurious associations when mapping the genetic basis of a disease. Our approach is more accurate and computationally efficient method for inferring genetic ancestry in the large-scale genetic studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。