Abstract
BACKGROUND AND OBJECTIVE: Extracellular vesicles (EVs), considered as a form of liquid biopsy, have gained significant attention in recent years due to their stability and the preservation of disease markers. Research studies underscore the clinical significance of molecules found in EVs, highlighting their role as communicative mediators between cells. However, analyzing this data is challenging due to noisy measurements, having far more variables than samples, and some groups (e.g., disease subtypes or experimental conditions) having much less data than others. We therefore develop an algorithm to address aforementioned challenges for the classification of imbalanced EVs omics data. METHODS AND RESULTS: We propose the EV Meta-Weight Elastic Net Algorithm (MWENA), which utilizes logistic regression with elastic net regularization for the classification and identification of EV signatures, effectively addressing the challenges posed by high-dimensional small sample sizes. To mitigate issues related to class imbalance and high noise levels, MWENA incorporates an automatic sample re-weighting function, which uses a meta-net to adaptively learn generalizable patterns directly from the data itself. We validate the MWENA algorithm on both simulated data and EVs omics data, covering six classification tasks that involve four different types of diseases (pancreatic ductal adenocarcinoma, interstitial lung diseases, colorectal cancer, and ovarian cancer) and three clinical scenarios (disease diagnosis, disease-stage screening, and disease-subtype classification). Compared to other machine learning methods, MWENA demonstrates superiority in identifying small class samples and achieves the highest scores in both sensitivity and G-means. Biological analysis is also performed to further explore the significance of selected signatures as biological markers and their roles in disease mechanisms. CONCLUSIONS: We anticipate that our proposed approach will take a modest step in harnessing EV omics data to discover biomarkers, aiding researchers in gaining a comprehensive understanding of biological processes.