Expanding the Genome in a Bottle Truth Set: Detection and Validation of Novel Low-frequency Variants Using High-accuracy NanoSeq

扩展“瓶中基因组”真值集:利用高精度NanoSeq检测和验证新型低频变异

阅读:1

Abstract

Somatic mutations record tissue molecular history and inform risk, prognosis, and therapy, yet their variant allele fractions often fall below the reliable detection limit of conventional short-read sequencing. In contrast, duplex sequencing technology featured by NanoSeq applies the principle of single molecule detection and thereby overcomes the limitation. However, the original NanoSeq protocol relies on the restriction enzyme-based genome fragmentation, which constrained its genome coverage to 30-40%. To enable whole-genome discovery with duplex-level fidelity, we pursued two complementary approaches to optimize the NanoSeq protocol: (i) a restriction-enzyme strategy densifies accessible sites using orthogonal 4-bp cutters; and (ii) a workflow using sonication followed by mung bean nuclease with T4 polynucleotide kinase, Klenow fragment and dATP/ddBTP mixture (NanoSeq-MBN) to blunt and repair/A-tailing DNA, while minimizing repair artifacts. We systematically benchmarked their performance using Genome in a Bottle (GIAB) gold-standard sample mixtures. As a result, NanoSeq-MBN achieved near genome-wide, Poisson-like coverage with minimal trinucleotide-context bias and ultra-high accuracy. Beyond variants already present in the GIAB truth set, NanoSeq-MBN identified approximately 120,000-160,000 de novo mutations per sample missing in the truth set, Notably, over 98% had orthogonal support in reanalyzed GIAB bulk Illumina HiSeq libraries. These novel variants extended GIAB from germline benchmarking to rare-variant discovery and calibration of subclonal detection. Functional annotation revealed enrichment of high Combined Annotation Dependent Depletion (CADD) scores mutations in exonic and splice-related regions. Variants intersecting ClinVar entries and OMIM genes highlighted potential for surveillance and clinical triage. Collectively, these results add a somatic layer to GIAB, enabling calibration of burdens and mutational signatures in lymphoblastoid lines and provide reference material for rare-variant assays. The NanoSeq-MBN workflow offers a path to whole-genome, high-fidelity discovery of ultra-rare somatic variation with relevance to clinical assay validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。