High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

对包括 602 个三联体在内的扩展版“千人基因组计划”队列进行高覆盖率全基因组测序

阅读:3
作者:Marta Byrska-Bishop ,Uday S Evani ,Xuefang Zhao ,Anna O Basile ,Haley J Abel ,Allison A Regier ,André Corvelo ,Wayne E Clarke ,Rajeeva Musunuri ,Kshithija Nagulapalli ,Susan Fairley ,Alexi Runnels ,Lara Winterkorn ,Ernesto Lowy ,Soren Germer ,Harrison Brand ,Ira M Hall ,Michael E Talkowski ,Giuseppe Narzisi ,Michael C Zody

Abstract

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies. Keywords: 1000 Genomes Project; INDEL; SNV; population genetics; reference imputation panel; structural variation; trio sequencing; whole-genome sequencing.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。