Hybrid Synthetic Minority Over-sampling Technique (HSMOTE) and Ensemble Deep Dynamic Classifier Model (EDDCM) for big data analytics

混合合成少数类过采样技术（HSMOTE）和集成深度动态分类器模型（EDDCM）在大数据分析中的应用

阅读：1

作者：M,Priyadharsini,Tyagi,Bhawana,R,Naga Priyadarsini,B,Mohankumar

期刊：	Scientific Reports	影响因子：	3.900
时间：	2025	起止号：	2025 Nov 11;15(1):39495
doi：	10.1038/s41598-025-23062-3	靶点：	SMO、DDC

Abstract

Big Data Classification (BDC) has become increasingly important across domains such as healthcare, e-commerce, and banking. However, challenges such as high dimensionality and class imbalance often degrade the performance of conventional machine learning (ML) models. This study proposes a hybrid framework that integrates meta-heuristic optimization with class imbalance handling to enhance BDC effectiveness. To address the class imbalance problem in both binary and multi-class datasets, a Hybrid Synthetic Minority Over-sampling Technique (HSMOTE) is introduced. HSMOTE generates synthetic minority samples by interpolating between closely located minority instances, improving the representation of rare classes. For robust feature selection, the Optimization Ensemble Feature Selection Model (OEFSM) is developed by combining the outputs of three algorithms: Fuzzy Weight Dragonfly Algorithm (FWDFA), Adaptive Elephant Herding Optimization (AEHO), and Fuzzy Weight Grey Wolf Optimization (FWGWO). These algorithms contribute diverse search strategies to improve feature relevance and reduce redundancy. To handle classification, the Ensemble Deep Dynamic Classifier Model (EDDCM) is proposed. EDDCM incorporates three deep learning (DL) architectures Density Weighted Convolutional Neural Network (DWCNN), Density Weighted Bi-Directional Long Short-Term Memory (DWBi-LSTM), and Weighted Autoencoder (WAE). Their outputs are aggregated using a dynamic ensemble strategy that considers both accuracy and diversity to improve final prediction reliability. All models are implemented in MATLAB (2014a), and performance is evaluated using precision, recall, F-measure, and accuracy. The proposed framework demonstrates improved classification results across various datasets, particularly under conditions of imbalance and high dimensionality.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。