Abstract
Somatic mutations record tissue molecular history and inform risk, prognosis, and therapy, yet their variant allele fractions often fall below the reliable detection limit of conventional short-read sequencing. In contrast, duplex sequencing technology featured by NanoSeq applies the principle of single molecule detection and thereby overcomes the limitation. However, the original NanoSeq protocol relies on the restriction enzyme-based genome fragmentation, which constrained its genome coverage to 30-40%. To enable whole-genome discovery with duplex-level fidelity, we pursued two complementary approaches to optimize the NanoSeq protocol: (i) a restriction-enzyme strategy densifies accessible sites using orthogonal 4-bp cutters; and (ii) a workflow using sonication followed by mung bean nuclease with T4 polynucleotide kinase, Klenow fragment and dATP/ddBTP mixture (NanoSeq-MBN) to blunt and repair/A-tailing DNA, while minimizing repair artifacts. We systematically benchmarked their performance using Genome in a Bottle (GIAB) gold-standard sample mixtures. As a result, NanoSeq-MBN achieved near genome-wide, Poisson-like coverage with minimal trinucleotide-context bias and ultra-high accuracy. Beyond variants already present in the GIAB truth set, NanoSeq-MBN identified approximately 120,000-160,000 de novo mutations per sample missing in the truth set, Notably, over 98% had orthogonal support in reanalyzed GIAB bulk Illumina HiSeq libraries. These novel variants extended GIAB from germline benchmarking to rare-variant discovery and calibration of subclonal detection. Functional annotation revealed enrichment of high Combined Annotation Dependent Depletion (CADD) scores mutations in exonic and splice-related regions. Variants intersecting ClinVar entries and OMIM genes highlighted potential for surveillance and clinical triage. Collectively, these results add a somatic layer to GIAB, enabling calibration of burdens and mutational signatures in lymphoblastoid lines and provide reference material for rare-variant assays. The NanoSeq-MBN workflow offers a path to whole-genome, high-fidelity discovery of ultra-rare somatic variation with relevance to clinical assay validation.