Multivariate Optimization of k for k-Nearest-Neighbor Feature Selection With Dichotomous Outcomes: Complex Associations, Class Imbalance, and Application to RNA-Seq in Major Depressive Disorder

基于多元优化的k近邻特征选择算法在二分结果中的应用：复杂关联、类别不平衡及其在重度抑郁症RNA测序中的应用

阅读：1

期刊：		影响因子：
时间：	2025	起止号：	2025 Jan-Feb;22(1):39-51
doi：	10.1109/TCBBIO.2024.3494599	研究方向：	神经科学
疾病类型：	抑郁症

Abstract

Optimization of nearest-neighbor feature selection depends on the number of samples and features, the type of statistical effect, the feature scoring algorithm, and class imbalance. We recently reported a fixed-k for Nearest-neighbor Projected-Distance Regression (NPDR) that addresses each of these parameters, except for class imbalance. To remedy this, we parameterize our NPDR fixed-k by the minority class size (minority-class-k). We also introduce a class-adaptive fixed-k (hit-miss-k) to improve performance of Relief-based algorithms on imbalanced data. In addition, we present two optimization methods, including constrained variable-wise optimized k (VWOK) and a fixed-k derived with principal components analysis (kPCA), both of which are adaptive to class imbalance. Using simulated data, we show that our methods significantly improve feature detection across a variety of nearest-neighbor feature scoring metrics, and we demonstrate superior performance in comparison to random forest and ridge regression using consensus-nested cross-validation (cnCV) for feature selection. We applied cnCV to RNASeq expression data from a study of Major Depressive Disorder (MDD) using NPDR with minority-class-k, random forest, and cnCV-ridge regression for gene importance. Pathway analysis showed that NPDR with minority-class-k alone detected genes with clear relevance to MDD, suggesting that our new fixed-k formula is an effective rule-of-thumb.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。