Abstract
Lung cancer remains one of the most lethal cancers globally. Its early detection is vital to improving survival rates. In this work, we propose a hybrid computer-aided diagnosis (CAD) pipeline for lung cancer classification using Computed Tomography (CT) scan images. The proposed CAD pipeline integrates ten image preprocessing techniques and ten pretrained deep learning models for feature extraction including convolutional neural networks and transformer-based architectures, and four classical machine learning classifiers. Unlike traditional end-to-end deep learning systems, our approach decouples feature extraction from classification, enhancing interpretability and reducing the risk of overfitting. A total of 400 model configurations were evaluated to identify the optimal combination. The proposed approach was evaluated on the publicly available Lung Image Database Consortium and Image Database Resource Initiative dataset, which comprises 1018 thoracic CT scans annotated by four thoracic radiologists. For the classification task, the dataset included a total of 6568 images labeled as malignant and 4849 images labeled as benign. Experimental results show that the best performing pipeline, combining Contrast Limited Adaptive Histogram Equalization, Swin Transformer feature extraction, and eXtreme Gradient Boosting, achieved an accuracy of 95.8%.