A robust benchmark for detecting low-frequency variants in the HG002 Genome In A Bottle NIST reference material

用于检测 HG002 Genome In A Bottle NIST 参考物质中低频变异的可靠基准

阅读:1

Abstract

Somatic mosaicism is an important cause of disease, but mosaic and somatic variants are often challenging to detect because they exist in only a fraction of cells. To address the need for benchmarking subclonal variants in normal cell populations, we developed a benchmark containing mosaic variants in the Genome in a Bottle Consortium (GIAB) HG002 reference material DNA from a large batch of a normal lymphoblastoid cell line. First, we used a somatic variant caller with high coverage (300x) Illumina whole genome sequencing data from the Ashkenazi Jewish trio to detect variants in HG002 not detected in at least 5% of cells from the combined parental data. These candidate mosaic variants were subsequently evaluated using >100x BGI, Element, and PacBio HiFi data. High confidence candidate SNVs with variant allele fractions above 5% were included in the HG002 draft mosaic variant benchmark, with 13/85 occurring in medically relevant gene regions. We also delineated a 2.45 Gbp subset of the previously defined germline autosomal benchmark regions for HG002 in which no additional mosaic variants >2% exist, enabling robust assessment of false positives. The variant allele fraction of some mosaic variants is different between batches of cells, so using data from the homogeneous batch of reference material DNA is critical for benchmarking these variants. External validation of this mosaic benchmark showed it can be used to reliably identify both false negatives and false positives for a variety of technologies and detection algorithms, demonstrating its utility for optimization and validation. By adding our characterization of mosaic variants in this widely-used cell line, we support extensive benchmarking efforts using it in simulation, spike-in, and mixture studies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。