Abstract
Development of new pharmaceuticals such as penicillin and numerous anticancer compounds often begins with the discovery and characterization of natural products (NPs). Nuclear magnetic resonance (NMR) spectroscopy is an essential tool for elucidating the chemical structure of NPs. However, interpreting NMR spectra is a time-consuming process that requires considerable domain expertise. This has led to the development of computational tools that directly annotate structures from NMR spectra, accelerating this elucidation and discovery process. Here we introduce SPECTRE, a state-of-the-art transformer-based model for structure dereplication and annotation. Key contributions of this tool include (1) a novel, transformer-based structure annotation method that accepts several types of NMR data for flexible annotation; (2) a novel, entropy-optimized, and collision-free molecular binary fingerprint that enhances the accuracy when retrieving molecular candidates. SPECTRE achieves a new state of the art 80% top-1 annotation accuracy using a challenging search space of 526,163 molecules. SPECTRE is also the first tool to provide fine-grained similarity maps between predicted and retrieved structures. These maps enable substructure-level interpretation, offering interpretable visual cues that highlight matched chemical fragments. Even in cases where overall molecular similarity is low, these fragment-level hints offer valuable insights to chemists, guiding hypothesis generation and accelerating the structure elucidation process.