A Stability-Oriented Biomarker Selection Framework Synergistically Driven by Robust Rank Aggregation and L1-Sparse Modeling

基于稳健排序聚合和L1稀疏建模协同驱动的稳定性导向生物标志物选择框架

阅读:1

Abstract

Background: In high-dimensional, small-sample omics studies such as metabolomics, feature selection not only determines the discriminative performance of classification models but also directly affects the reproducibility and translational value of candidate biomarkers. However, most existing methods primarily optimize classification accuracy and treat stability as a post hoc diagnostic, leading to considerable fluctuations in selected feature sets under different data splits or mild perturbations. Methods: To address this issue, this study proposes FRL-TSFS, a feature selection framework synergistically driven by filter-based Robust Rank Aggregation and L1-sparse modeling. Five complementary filter methods-variance thresholding, chi-square test, mutual information, ANOVA F test, and ReliefF-are first applied in parallel to score features, and Robust Rank Aggregation (RRA) is then used to obtain a consensus feature ranking that is less sensitive to the bias of any single scoring criterion. An L1-regularized logistic regression model is subsequently constructed on the candidate feature subset defined by the RRA ranking to achieve task-coupled sparse selection, thereby linking feature selection stability, feature compression, and classification performance. Results: FRL-TSFS was evaluated on six representative metabolomics and gene expression datasets under a mildly perturbed scenario induced by 10-fold cross-validation, and its performance was compared with multiple baselines using the Extended Kuncheva Index (EKI), Accuracy, and F1-score. The results show that RRA substantially improves ranking stability compared with conventional aggregation strategies without degrading classification performance, while the full FRL-TSFS framework consistently attains higher EKI values than the other feature selection schemes, markedly reduces the number of selected features to several tens of metabolites or genes, and maintains competitive classification performance. Conclusions: These findings indicate that FRL-TSFS can generate compact, reproducible, and interpretable biomarker panels, providing a practical analysis framework for stability-oriented feature selection and biomarker discovery in untargeted metabolomics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。