Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering

利用最近邻聚类解决生物工程中不平衡数据集的问题

阅读:1

Abstract

Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-Neighbor (OCNN) algorithm. The original inter-quartile range mechanism is replaced with the K-means with outlier removal (KMOR) algorithm for efficient outlier identification in the target class. Parameters are optimized by treating these outliers as non-target-class samples. A new algorithm, the Location-based Nearest-Neighbor (LBNN) algorithm, clusters one-class training data using KMOR and calculates the farthest distance and percentile for each test data point to determine if it belongs to the target class. Experiments cover parameter studies, validation on eight standard imbalanced datasets from KEEL, and three applications on real medical imbalanced datasets. Results show superior performance in precision, recall, and G-means compared to traditional classification models, making it effective for handling imbalanced data challenges.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。