Abstract
Accurate detection of low-frequency DNA variants (below 1%) is essential in diverse biological and clinical contexts, yet remains fundamentally constrained by the high intrinsic error rates of next-generation sequencing technologies. Although unique molecular identifiers (UMIs) have significantly mitigated these errors by uniquely indexing original template molecules, their efficacy is compromised by UMI collisions and by artifacts introduced during polymerase chain reaction (PCR) amplification and sequencing, which collectively engender false-positive variant calls. Here, we present AFUMIC, an alignment-free UMI clustering framework that systematically addresses these limitations through collision-resilient UMI grouping and a consensus quality score (CQS)-guided strategy for high-fidelity consensus sequence generation. AFUMIC reduces singleton families, enhances clustering precision, and maximizes data retention, yielding 7.27-fold and 3.84-fold increases in single-strand consensus sequence and duplex consensus sequence output, respectively, compared to Du Novo. It further decreases the per-base error rate from $3.01 \times 10^{-4}$ to $2.10 \times 10^{-5}$ and raises the proportion of error-free positions from 45.27% to 99.85%, enabling confident detection of variants at variant allele frequencies as low as $1.0 \times 10^{-5}$. Notably, AFUMIC exhibits superior computational efficiency, rendering it well-suited for high-throughput analysis of UMI-tagged libraries in large-scale genomic studies. Collectively, AFUMIC represents an efficient methodology for ultrasensitive variant detection and establishes a broadly applicable and computationally efficient framework for error-corrected sequencing that can be readily deployed in both clinical diagnostics and large-scale genomic research.