Pruning tree forest and re-sampling for class imbalanced problem

针对类别不平衡问题,对森林进行剪枝和重采样

阅读:1

Abstract

Class imbalance remains a critical challenge in machine learning, as it often leads to biased predictions where algorithms disproportionately favor the majority class, resulting in the misclassification of minority class instances and reduced overall model performance. This study explores an innovative approach to addressing class imbalance in Random Forests by combining pruning with resampling techniques. While pruning typically improves performance and reduces computational costs, its effectiveness can be limited in complex ensembles dealing with imbalanced data. To tackle this, the proposed method incorporates three resampling strategies: under-sampling the majority class, over-sampling the minority class, and a hybrid of both. After balancing the training data, multiple trees are grown from bootstrap samples, and only those with low out-of-bag error rates are selected for the final ensemble. The classification performance of the proposed method is evaluated and compared against standard algorithms including k-Nearest Neighbors (k-NN), Tree, Random Forest (RF), Balanced Random Forest (BRF), and Support Vector Machine (SVM). The results demonstrate that the proposed method outperformed its competitors in most of the cases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1038/s41598-026-38320-1.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。