Abstract
Green AI aims to design and train machine learning models while taking into consideration sustainable resource usage without sacrificing model efficiency. The exponential growth of training data has led to results in increasing computational cost and energy consumption. Techniques like pruning, quantization, and knowledge distillation are used to shrink AI models. Data reduction is one of these techniques that enhances both the training speed up factor and the green AI score. To overcome these challenges, we introduce KM-DBSCAN, a new data clustering algorithm for intelligent data reduction. It aims to combine the geometric simplicity of K-Means with the density-awareness and noise resilience of DBSCAN to enhance the performance and the efficiency of data clustering for better border detection even in overlapping scenarios. The effect of data reduction has been examined on training and testing different machine learning models including SVM, MLP and CNN on six benchmark datasets which are Banana, USPS, Adult9a, Collision, Dry Bean and Melanoma. KM-DBSCAN achieved up to 90% data reduction, training speedups up to 3.6× to 6900×, and carbon emission 0.0219 g to 5.374 g , while preserving competitive accuracy (e.g., 90.39% accuracy in melanoma classification using only 28.7% of the training data, with just 0.0061% accuracy loss and a 71.65% reduction in carbon emissions compared to training on the full dataset). These results demonstrate that KM-DBSCAN enables efficient and environmentally-conscious learning without compromising predictive performance.