Abstract
The inherent complexity of mass spectra and the lack of direct correlation between spectral and structural similarities retards structure elucidation and accurate peak annotation. Our methodology employs modified atomic environments from topological radii zero to represent collections of annotated spectral peaks. Rather than aiming for de novo structure prediction from spectra, our objective is to refine and re-rank candidate structures retrieved by existing library search methods using predicted atom environments. We conducted a multi-step complexity reduction to mass-to-fragment mappings and trained the Transformer model to predict the atomic environments of compounds directly from mass and intensity data, achieving a peak precision of 86.1% and a recall rate of 78.4% on the test set. This novel framework not only aids in interpreting EI-MS data by providing insights into structural contents but also refines cosine similarity rankings by suggesting the inclusion or exclusion of certain atomic environments. Our findings over the NIST database suggest that our approach complements conventional methods by improving spectra matching through an in-depth atomic-level analysis.