KM-DBSCAN: an enhanced density and centroid based border detection framework for data reduction towards green AI

KM-DBSCAN:一种基于密度和质心的增强型边界检测框架,用于数据降维,以实现绿色人工智能

阅读:1

Abstract

Green AI aims to design and train machine learning models while taking into consideration sustainable resource usage without sacrificing model efficiency. The exponential growth of training data has led to results in increasing computational cost and energy consumption. Techniques like pruning, quantization, and knowledge distillation are used to shrink AI models. Data reduction is one of these techniques that enhances both the training speed up factor and the green AI score. To overcome these challenges, we introduce KM-DBSCAN, a new data clustering algorithm for intelligent data reduction. It aims to combine the geometric simplicity of K-Means with the density-awareness and noise resilience of DBSCAN to enhance the performance and the efficiency of data clustering for better border detection even in overlapping scenarios. The effect of data reduction has been examined on training and testing different machine learning models including SVM, MLP and CNN on six benchmark datasets which are Banana, USPS, Adult9a, Collision, Dry Bean and Melanoma. KM-DBSCAN achieved up to 90% data reduction, training speedups up to 3.6× to 6900×, and carbon emission 0.0219 g to 5.374 g , while preserving competitive accuracy (e.g., 90.39% accuracy in melanoma classification using only 28.7% of the training data, with just 0.0061% accuracy loss and a 71.65% reduction in carbon emissions compared to training on the full dataset). These results demonstrate that KM-DBSCAN enables efficient and environmentally-conscious learning without compromising predictive performance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。