Abstract
Selecting informative genes is essential for building accurate and efficient diagnostic models, especially for high-dimensional gene expression data such as those related to Alzheimer's disease. These datasets typically involve thousands of genes with limited samples, increasing the risk of overfitting and computational complexity. To address this, we propose a LASSO-HHO Gene Selection (LHGS) framework. LASSO is first applied to reduce dimensionality by filtering irrelevant genes. A conditional HHO-based optimization stage is then applied only when the LASSO-selected subset does not achieve sufficient accuracy or remains relatively large. Otherwise, the LASSO-selected features are directly used without further optimization. Experimental results show that the proposed method reduces the number of selected genes by up to 99.9% while maintaining high performance. The experimental results indicate that 100% accuracy can be achieved on specific datasets. In particular, GSE48350 and GSE36980 achieved 100% accuracy using LASSO alone, whereas GSE118553 and GSE132903 required the full LHGS framework to achieve the same performance. The framework also improves computational efficiency within a consistent experimental setup by reducing the optimization search space after LASSO filtering. Overall, LHGS provides a practical and efficient solution for gene selection in high-dimensional biomedical data.