Abstract
Advances in liquid chromatography-tandem mass spectrometry (LC-MS/MS) and chemometrics have driven the field of untargeted metabolomics forward. However, interpretation of these studies can be challenging, since these data sets often return thousands of features with only a small fraction identifiable. Most of these unidentified features arise from degeneracy, where a single analyte produces multiple features from adduct formation, in-source fragments, and isotopes. This work improves the detection and clustering of these degenerate features with a new peak shape consistency metric, termed lack-of-fit (LOF). This metric quantifies the residual error between two features within a time window, where a LOF <20% suggests degeneracy. To first evaluate metric performance, 21 analytes were spiked into brain dialysate and features were discovered using tile-based Fisher ratio (F-ratio) analysis. Incorporating the proposed LOF metric not only reduced the feature list and retained all the spiked analytes in the top 25 hits but also outperformed other data-driven degeneracy methods by ∼18-48%. LOF clustering was then applied to reduce the number of features detected in brain dialysate at different preconcentration levels and mobile phase gradient lengths. LOF clustering provided a 90% degenerate feature classification accuracy, surpassing existing LC-MS/MS data processing methods. Despite the chromatographic complexity, this metric could resolve features from coeluting compounds at lower resolutions (R(s) ≤ 0.3) than the standard correlation-based method. LOF clustering reduced this untargeted data set by ∼79-86%, ultimately revealing ∼583 unique metabolites in brain dialysate. These results collectively demonstrate that the LOF metric can improve and strengthen the interpretation of untargeted metabolomics data.