Identifying Single-Origin Rare Variants in Population Genomic Data

识别群体基因组数据中的单源罕见变异

阅读:1

Abstract

Genomic analyses have shown that some mutations in large population genomic datasets may be the result of repeated, independent events at the same locus. However, the possibility of recurrent mutation is often ignored, even when it has the potential to introduce errors, such as when assuming co-ancestry for demographic analysis. Even rare variants such as doubletons, which should be particularly informative about recent demography, may have multiple origins despite arising relatively recently in the population. Here, we develop methods to (i) estimate the frequency of recurrent doubletons in a population genomic dataset from the occurrence of tri-allelic sites with two different singleton mutations and (ii) identify a subset of high confidence single-origin doubletons based on the presence of a linked rare variant on the surrounding shared haplotype. Applying these methods to data for the malaria mosquito Anopheles gambiae sampled from across Africa, we estimate that ∼16% of doubletons had independent origins. We then identify a subset of doubletons highly likely (∼99%) to have a single origin, which consists of ∼68% of all the expected single-origin doubletons (and ∼57% of all observed doubletons). The effectiveness of our methods is demonstrated by both further data analyses and coalescent simulations, and these doubletons are then used to test population genetic hypotheses about recombination, selection, and isolation by distance. The methods developed here should be useful for demographic inference when populations or sample sizes are large enough that recurrent mutation cannot be ignored.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。