Tracing the evolutionary histories of ultra-rare variants using variational dating of large ancestral recombination graphs

利用大型祖先重组图的变异年代测定法追踪超罕见变异的进化历史

阅读:2

Abstract

Ultra-rare variants dominate whole-genome sequencing datasets, yet their interpretation is limited by allele frequency, which provides little information at very low counts and is highly sensitive to uneven ancestry representation. Allele age offers an ancestry-agnostic alternative but existing methods do not scale to biobank-sized cohorts. Here we present a scalable variational algorithm for dating Ancestral Recombination Graphs (ARGs), implemented in tsdate, together with new distributed methods enabling practical biobank-scale ARG inference using tsinfer. Applied to 47,535 genomes from the Genomics England 100,000 Genomes Project, we infer contiguous ARGs spanning 206 Mb and estimate ages for 23.2 million variants, including 11.8 million singletons. ARG-based allele ages remain accurate under extreme sampling imbalance and, in real data, reveal signatures of purifying selection and clinically relevant heterogeneity among variants with identical observed frequencies. Estimates for recent mutations are precise only at large sample sizes, highlighting the information accessible in the haplotype structure of large datasets. Biobank-scale ARGs therefore enable robust, ancestry-agnostic age estimation for ultra-rare variation with broad utility for statistical and clinical genomics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。