Abstract
High-resolution mass spectrometry (HRMS) is a cornerstone technology to dereplicate small molecules by comparing their MS spectral data to references in extensive chemical databases. However, most existing chemical databases lack robust support for processing spectral data or enabling direct m/z-based searches, limiting their usefulness for rapid compound identification. To address this, we developed OctoChemDB, a centralized database that aggregates and harmonizes chemical, biological, and spectral data from multiple open-access resources such as PubChem, MassBank, and GNPS. To make this data programmatically accessible, we implemented a REpresentational State Transfer Application Program Interface (REST API) that allows external tools and software to query the database using customizable parameters. This API serves as the core access point for developers and researchers to integrate OctoChemDB data into their own workflows and applications. As a practical demonstration of how the API can be used, we built a web application, available at https://octochemdb.cheminfo.org/, that enables users to perform m/z-based searches, predict molecular formulas, assess isotopic similarity, analyze fragmentation patterns, and retrieve associated literature and patents. This web interface serves as a user-friendly example of how the underlying database and API can be leveraged to accelerate small molecule identification. We illustrate the utility of the platform through case studies, including the identification of 3,4-methylenedioxymethamphetamine (MDMA) and caffeine, demonstrating its effectiveness in proposing structural hypotheses, matching experimental spectra with database entries, and streamlining dereplication workflows. The entire project, including source code, is available at https://github.com/cheminfo/octochemdb.