Abstract
The functions of biomolecules such as DNA are not only determined by their sequences but also strongly dependent on their stereostructures and geometric configurations of constituent units. However, most DNA sequencing techniques developed to date heavily rely on fluorescent labeling and extensive replication, and the DNA chains might thus be unwound and even modified during these treatments, resulting in the loss of structural information. Recent development of single-molecule tip-enhanced Raman spectroscopy (TERS) offers a label-free approach to identifying individual nucleobases with both high spatial resolution and chemical sensitivity but amounts to mainly proof-of-principle demonstrations of very short single-stranded DNA molecules. Here, we propose an algorithm-assisted strategy for determining the structures of long-chain DNA molecules based on TERS imaging in real space. We first develop a matching algorithm to rapidly simulate the TERS spectra and mapping images for long-chain DNA containing tens of thousands of atoms, circumventing the huge computational cost of the quantum chemistry simulation of large molecules. After validating the accuracy and efficiency of this algorithm in various DNA molecular systems, we further combine it with the Bayesian algorithm to determine the experimental molecular structures with single-nucleobase precision from the TERS measurements on two long-chain ssDNA model systems on surfaces, showing a general agreement between theories and experiments. Our approach will promote the machine-learning assisted TERS technique as a generic tool to identify the sequence and configuration of complicated biomolecular systems, including long DNA/RNA chains, proteins, and even glyco-peptide complexes.