Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE

利用 TRUFFLE 算法对未定相遗传数据进行快速准确的共享片段检测和亲缘关系估计

阅读:1

Abstract

Relationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified many fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1(st) to 5(th) degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。