Abstract
INTRODUCTION: Untargeted metabolomics is a powerful tool for detecting perturbations in biological systems, offering significant potential for screening for rare inherited metabolic disorders (IMDs). However, the rarity and vast diversity of these diseases, results in limited availability of samples and incomplete metabolic pathway knowledge for each condition. Current diagnostic procedures rely heavily on manual interpretation, which is time-consuming, and data driven approaches are insufficient for small sample sizes. OBJECTIVES: To develop a diagnostic algorithm for IMDs addressing the challenges posed by small sample sizes and continuously evolving datasets. METHODS: 77 IMD patients (35 different IMDs) and 136 control samples were collected from Copenhagen University Hospital, Rigshospitalet. The metabolome was analyzed using liquid chromatography-mass spectrometry. An algorithm partially based on sparse hierarchical clustering was designed to generate IMD-specific metabolic signatures from metabolomics data, enabling comparison with undiagnosed patient samples to provide diagnostic predictions. An iterative improvement strategy was employed, where new data are continuously integrated to refine the IMD-specific signatures. The algorithm's performance was evaluated through both the current study and a case study using literature-derived data. RESULTS: The algorithm demonstrated iterative improvement with each training round, correctly identifying the diagnosis within top 3 potential IMDs in 60% of samples (top 1 in 42%). The case study applied the method to literature-based data comprising 95 IMD samples (11 different IMDs) and 68 controls, yielding a correct diagnosis in 73.5% of cases. CONCLUSION: These results demonstrate that the algorithm provides a flexible, data-driven framework for continuous improvement in IMD diagnosis, even with limited number of samples.