Extend the benchmarking indel set by manual review using the individual cell line sequencing data from the Sequencing Quality Control 2 (SEQC2) project

利用测序质量控制 2 (SEQC2) 项目中的单个细胞系测序数据,通过人工审核扩展基准插入缺失 (indel) 数据集。

阅读:2
作者:Binsheng Gong ,Dan Li ,Yifan Zhang ,Rebecca Kusko ,Samir Lababidi ,Zehui Cao ,Mingyang Chen ,Ning Chen ,Qiaochu Chen ,Qingwang Chen ,Jiacheng Dai ,Qiang Gan ,Yuechen Gao ,Mingkun Guo ,Gunjan Hariani ,Yujie He ,Wanwan Hou ,He Jiang ,Garima Kushwaha ,Jian-Liang Li ,Jianying Li ,Yulan Li ,Liang-Chun Liu ,Ruimei Liu ,Shiming Liu ,Edwin Meriaux ,Mengqing Mo ,Mathew Moore ,Tyler J Moss ,Quanne Niu ,Ananddeep Patel ,Luyao Ren ,Nedda F Saremi ,Erfei Shang ,Jun Shang ,Ping Song ,Siqi Sun ,Brent J Urban ,Danke Wang ,Shangzi Wang ,Zhining Wen ,Xiangyi Xiong ,Jingcheng Yang ,Lihui Yin ,Chao Zhang ,Ruolan Zhang ,Ambica Bhandari ,Wanshi Cai ,Agda Karina Eterovic ,Dalila B Megherbi ,Tieliu Shi ,Chen Suo ,Ying Yu ,Yuanting Zheng ,Natalia Novoradovskaya ,Renee L Sears ,Leming Shi ,Wendell Jones ,Weida Tong ,Joshua Xu

Abstract

Accurate indel calling plays an important role in precision medicine. A benchmarking indel set is essential for thoroughly evaluating the indel calling performance of bioinformatics pipelines. A reference sample with a set of known-positive variants was developed in the FDA-led Sequencing Quality Control Phase 2 (SEQC2) project, but the known indels in the known-positive set were limited. This project sought to provide an enriched set of known indels that would be more translationally relevant by focusing on additional cancer related regions. A thorough manual review process completed by 42 reviewers, two advisors, and a judging panel of three researchers significantly enriched the known indel set by an additional 516 indels. The extended benchmarking indel set has a large range of variant allele frequencies (VAFs), with 87% of them having a VAF below 20% in reference Sample A. The reference Sample A and the indel set can be used for comprehensive benchmarking of indel calling across a wider range of VAF values in the lower range. Indel length was also variable, but the majority were under 10 base pairs (bps). Most of the indels were within coding regions, with the remainder in the gene regulatory regions. Although high confidence can be derived from the robust study design and meticulous human review, this extensive indel set has not undergone orthogonal validation. The extended benchmarking indel set, along with the indels in the previously published known-positive set, was the truth set used to benchmark indel calling pipelines in a community challenge hosted on the precisionFDA platform. This benchmarking indel set and reference samples can be utilized for a comprehensive evaluation of indel calling pipelines. Additionally, the insights and solutions obtained during the manual review process can aid in improving the performance of these pipelines. Keywords: Benchmarking; Bioinformatics; Indel; Precision medicine; Quality control.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。