Algorithmic and mathematical modeling for synthetically controlled overlapping

合成控制重叠的算法和数学建模

阅读:1

Abstract

For most classifiers, overlapping regions, where various classes are difficult to distinguish, affect the classifier's overall performance in multi-class imbalanced data more than the imbalance itself. In problem-data space, the overlapped samples share similar characteristics, resulting in a complex boundary, making it difficult to separate the samples of classes from each other, causing performance degradation. The research community agreed upon the relationship of the class overlapping issues with the classifier performance, but how much the classifier is affected is still unanswered. There is also a gap in the literature to demonstrate the different levels of class overlapping in multi-class problems. Accordingly, in this paper, four algorithms are implemented to synthetically generate controlled overlapping samples to be used with multiclass datasets using different schemes to show the worst effect of class overlapping. Experiments involve using different state-of-the-art non-parametric classifiers, support vector machines, k-nearest neighbor, and random forest, to classify these multi-class datasets to validate the class overlapping effect on their learning. The models are used to test the suitability, stability, and versatility of the proposed algorithms for the schemes and to highlight the effect of growing overlapping samples in complex multi-class problems having an imbalanced distribution of data and class overlapping issues. The experimental results using 20 real-world datasets, show the different levels of overlapping data and the effect of each level on the underlying classifiers.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。