Data Augmentation and Machine Learning algorithms for multi-class imbalanced morphometrics data of stingless bees

针对无刺蜜蜂多类别不平衡形态测量数据的数据增强和机器学习算法

阅读:1

Abstract

The study focusses on handling of multiclass imbalanced data on classification of stingless bee samples by employing data balancing techniques, namely Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) approach. These techniques are applied in combination with machine learning (ML) algorithms; specifically Random Forest (RF), and Support Vector Machine (SVM), to assess the models' predictive performance to infer stingless bee samples identities. We studied ML classifier models: RF, RF + SMOTE, RF + ADASYN, SVM, SVM + SMOTE and SVM + ADASYN on the six-class imbalanced dataset of stingless bees morphometrics. Multi-class area under curve (AUC), F1-score, G-mean, balanced accuracy, sensitivity and "No information rate" were used to assess model performance. SMOTE and ADASYN marginally improved the performance of RF and SVM classifiers. SVM outperformed RF, with SVM using SMOTE performing better than with ADASYN. SVM with ADASYN had a lower multi-class AUC (0.9898) and sensitivity (0.956) but a higher F1-score (0.939) compared to SVM with SMOTE (AUC = 0.9918, sensitivity = 0.959, F1-score = 0.934). Overall, SVM with SMOTE was superior to RF with SMOTE. All models except SVM with ADASYN, correctly classified four of the six species, M. (Meliponula) bocandei, M. (Meliplebeia) lendliana, D. schmidti and P. armata but not the two morphs, Meliponula (Axestotrigona) togoensis and Meliponula (Axestotrigona) ferruginea. This study therefore confirms that the impact of imbalanced learning is minimal when classes are separable. Random forest recursive feature elimination technique was used to assess variable importance, guiding future studies on key morphometric measurements to save time and cost while maintaining high classification performance. Our results pave the way for the development of smart and automated machine learning applications to complement the existing methods for the identification of stingless bee species.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。