Abstract
OBJECTIVES: Non-small cell lung cancer (NSCLC) is associated with poor prognosis, with 30% of patients diagnosed at an advanced stage. Mutations in the EGFR and KRAS genes are important prognostic factors for NSCLC, and targeted therapies can significantly improve survival in these patients. Although tissue biopsy remains the gold standard for detecting gene mutations, it has limitations, including invasiveness, sampling errors due to tumor heterogeneity, and poor reproducibility. This study aims to develop machine learning models based on radiomic features to predict EGFR and KRAS gene mutation status in NSCLC patients, thereby providing a reference for precision oncology. METHODS: Imaging and mutation data from eligible NSCLC patients were obtained from the publicly available Lung-PET-CT-Dx dataset in The Cancer Imaging Archive (TCIA). A three-dimensional-convolutional neural network (3D-CNN) was used to extract imaging features from the regions of interest (ROI). The LightGBM algorithm was employed to build classification models for predicting EGFR and KRAS gene mutation status. Model performance was evaluated using 5-fold cross-validation, with receiver operator characteristic (ROC) curves, area under the curve (AUC), accuracy, sensitivity, and specificity used for validation. RESULTS: The models effectively predicted EGFR and KRAS mutations in NSCLC patients, achieving an AUC of 0.95 for EGFR mutations and 0.90 for KRAS. The models also demonstrated high accuracy (EGFR 89.66%; KRAS 87.10%), sensitivity (EGFR 93.33%; KRAS 87.50%), and specificity (EGFR 85.71%; KRAS 86.67%). CONCLUSIONS: A radiogenomics-machine learning predictive model can serve as a non-invasive tool for anticipating EGFR and KRAS gene mutation status in NSCLC patients.