Abstract
BACKGROUND: Epidermal growth factor receptor (EGFR) status critically guides tyrosine kinase inhibitor therapy in locally advanced non-small cell lung cancer (LA-NSCLC). This study aimed to develop and validate an 18 F-FDG PET/CT-based stacking ensemble model leveraging metabolic habitat radiomics to noninvasively predict EGFR mutations of LA-NSCLC and stratify prognosis in the patients. METHODS: This multicenter study analyzed 313 LA-NSCLC patients divided into training (n = 142), internal testing (n = 61), and external testing (n = 110) cohorts. Tumors were segmented into four spatially distinct, biologically similar metabolic habitat subregions (PET(high)-CT(high), PET(low)-CT(low), PET(low)-CT(high) and PET(high)-CT(low)) via Otsu algorithm. Three-step feature selection identified robust and effective radiomic features from various metabolic habitats. The ensemble learning algorithm was applied to develop habitat radiomics and combined clinical-habitat models for prediction of EGFR mutation. Patients were divided into low-risk and high-risk groups based on the median of combined score. Kaplan-Meier survival analysis compared progression free survival (PFS) and overall survival (OS) between two groups. RESULTS: The stacking ensemble model exhibited the highest performance, with AUC values of 0.920, 0.900 and 0.884 in training, internal testing and external testing cohorts, outperforming other models. Multimodal integration with clinical features further enhanced performance (AUCs: 0.935/0.921/0.905). Prognostic stratification revealed significant PFS and OS differences between risk groups (all log-rank P < 0.05). Habitat-level analysis identified PET(high)-CT(high) habitat volume fraction and PET(low)-CT(high) habitat voxel count as EGFR mutation correlates. CONCLUSION: The stacking ensemble model based on 18 F-FDG PET/CT metabolic habitat radiomics demonstrates potential for predicting EGFR mutations in LA-NSCLC. The combined model with a further improved performance by integrating with the clinical feature, and enabled prognostic stratification for LA-NSCLC patients. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12880-026-02163-z.