Abstract
As DNA data storage gains popularity, efficient trace reconstruction algorithms are crucial for fast decoding of data from noisy sequenced reads (or "traces"). Existing approaches, often adaptations of multiple sequence alignment or read correction methods, rely on strict assumptions of fixed error rates, showing limited generalizability to more complex datasets and with slower running times. We introduce a probabilistic formulation of the trace reconstruction problem by modeling traces as observations from a k-th order Markov chain. Instead of doing alignment, we identify the sequence most likely generated by the Markov chain as the consensus. This inspires bidirectional beam search (BBS), an algorithm that reconstructs the consensus in linear time with respect to its length. Experiments on multiple public Nanopore sequencing datasets demonstrate that BBS achieves top-tier accuracy while being approximately 20× faster than existing methods, showing its potential to enhance the efficiency and reliability of DNA data storage systems.