Characterizing Families of Spectral Similarity Scores and Their Use Cases for Gas Chromatography-Mass Spectrometry Small Molecule Identification

光谱相似性评分族及其在气相色谱-质谱小分子鉴定中的应用案例分析

阅读:1

Abstract

Metabolomics provides a unique snapshot into the world of small molecules and the complex biological processes that govern the human, animal, plant, and environmental ecosystems encapsulated by the One Health modeling framework. However, this "molecular snapshot" is only as informative as the number of metabolites confidently identified within it. The spectral similarity (SS) score is traditionally used to identify compound(s) in mass spectrometry approaches to metabolomics, where spectra are matched to reference libraries of candidate spectra. Unfortunately, there is little consensus on which of the dozens of available SS metrics should be used. This lack of standard SS score creates analytic uncertainty and potentially leads to issues in reproducibility, especially as these data are integrated across other domains. In this work, we use metabolomic spectral similarity as a case study to showcase the challenges in consistency within just one piece of the One Health framework that must be addressed to enable data science approaches for One Health problems. Here, using a large cohort of datasets comprising both standard and complex datasets with expert-verified truth annotations, we evaluated the effectiveness of 66 similarity metrics to delineate between correct matches (true positives) and incorrect matches (true negatives). We additionally characterize the families of these metrics to make informed recommendations for their use. Our results indicate that specific families of metrics (the Inner Product, Correlative, and Intersection families of scores) tend to perform better than others, with no single similarity metric performing optimally for all queried spectra. This work and its findings provide an empirically-based resource for researchers to use in their selection of similarity metrics for GC-MS identification, increasing scientific reproducibility through taking steps towards standardizing identification workflows.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。