Pastrami: a fast and efficient algorithm for fine-scale genetic ancestry inference

Pastrami:一种用于精细尺度遗传祖源推断的快速高效算法

阅读:2

Abstract

Genomics research increasingly relies on large population biobanks that include many thousands of participants. However, current genetic ancestry inference methods are computationally inefficient and prohibitively slow when applied to such large cohorts. The aim of this work was to develop a fast and efficient algorithm for fine-scale genetic ancestry inference on biobank-size cohorts. The Pastrami algorithm that we developed performs supervised genetic ancestry inference by comparing haplotypes between query and global reference samples, creating query and reference haplotype copying vectors, and relating them via non-negative least squares regression to estimate ancestry fractions. We used Pastrami for ancestry inference on genomic data sets from Africa, the Americas, and the United Kingdom, comparing its accuracy and runtime performance to the most widely used haplotype-based ancestry inference methods. Pastrami ancestry estimates are highly similar to estimates from the ChromoPainter and RFMix programs. The total CPU time required by Pastrami increases linearly with the number of samples, and it achieves ∼45× faster runtime than ChromoPainter. When run on 488 377 UK Biobank and 3433 reference samples, Pastrami used 2340 CPU hours compared to ∼105 000 CPU hours for ChromoPainter. The Pastrami program and documentation are made freely available on GitHub: https://github.com/healthdisparities/pastrami.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。