Abstract
As modern medical technology advances, the utilization of gene expression data has proliferated across diverse domains, particularly in cancer diagnosis and prognosis monitoring. However, gene expression data is often characterized by high dimensionality and a prevalence of redundant and noisy information, prompting the need for effective strategies to mitigate issues like the curse of dimensionality and overfitting. This study introduces a novel hybrid ensemble equilibrium optimizer gene selection algorithm in response. In the first stage, a hybrid approach, combining multiple filters and gene correlation-based methods, is used to select an optimal subset of genes, which is achieved by evaluating the redundancy and complementary relationships among genes to obtain a subset with maximal information content. In the second stage, an equilibrium optimizer algorithm incorporating Gaussian Barebone and a novel gene pruning strategy is employed to further search for the optimal gene subset within the candidate gene space selected in the first stage. To demonstrate the superiority of the proposed method, it was compared with nine feature selection techniques on 15 datasets. The results indicate that the ensemble filtering method in the first stage exhibits strong stability and effectively reduces the search space of the gene selection algorithms. The improved equilibrium optimizer algorithm enhances the prediction accuracy while significantly reducing the number of selected features. These findings highlight the effectiveness of the proposed method as a valuable approach for gene selection.