Abstract
Early identification of lung cancer using questionnaire-based data offers a low-cost, non-invasive pathway to assist clinical decision-making. However, such datasets often contain redundant, noisy, and imbalanced attributes that limit the performance of traditional classifiers. This study introduces a hybrid LSTM-GRU framework optimized using a Grey Wolf-Whale Optimization (GWO-WOA) algorithm for hyperparameter tuning and Binary Particle Swarm Optimization (BPSO) for feature selection. Two public lung cancer datasets sourced from the Kaggle repository were employed: the first comprising 309 samples and the second containing 3000 samples. For both datasets, the preprocessing pipeline included missing-value imputation, categorical encoding, outlier removal, and z-score normalization to ensure feature consistency. Datasets were then split into 70%, 20%, and 10% subsets for training, validation, and testing, respectively. BPSO effectively selected the most informative features that contribute to accurate diagnosis. At the same time, GWO-WOA refined key hyperparameters, such as the learning rate, hidden units, and layer depth, of the hybrid architecture. Experimental results demonstrate the superior performance of the proposed GWO-WOA-LSTM-GRU model, achieving 100.00% accuracy, precision, recall, and F1-score on the 309-sample dataset, and 99.33% accuracy/F1 (precision: 99.34%, recall: 99.33%) on the 3000-sample dataset. In comparison, tuned single models-LSTM, GRU, CNN, and SVM-achieved accuracies ranging from 77.42 to 98.33%. These findings confirm that integrating metaheuristic optimization and hybrid recurrent networks enhances the robustness and generalization capabilities of lung cancer classification systems across diverse datasets, offering a reliable tool for early detection and clinical risk stratification.