Abstract
OBJECTIVE: This study aims to evaluate whether Positron Emission Tomography-Computed Tomography (PET-CT) imaging features of primary tumors and lymph nodes, combined with clinical and pathological data, can accurately predict mediastinal lymph node metastasis (MLNM) in resectable non-small cell lung cancer (NSCLC) using machine learning models. METHODS: A retrospective study was conducted on 390 NSCLC patients who underwent tumor resection and lymph node dissection between January 2017 and December 2023. All patients received 18F-fluorodeoxyglucose (18F-FDG) PET-CT scans within two weeks before surgery. Data from 390 primary tumors and 1,026 lymph node stations were analyzed. Clinical and PET-CT imaging features were extracted, and feature selection was performed using a random forest algorithm. Eight machine learning models were evaluated, including Logistic Regression, classification and regression tree (CART), support vector machine (SVM), gradient boosting decision tree (GBDT), Random Forest, multi-layer perceptron (MLP), extreme gradient boosting tree (XGBoost) and k-nearest neighbor algorithm (KNN). THREE MODELS WERE DEVELOPED: Tumor-Pathology-Clinical (TPC), Lymph-Pathology-Clinical (LPC), and Tumor-Lymph-Pathology-Clinical (TLPC). Model performance was assessed using Receiver Operating Characteristic (ROC) curves, Decision Curve Analysis (DCA), and confusion matrices. RESULTS: The TLPC model, based on the XGBoost algorithm, showed the best performance, with an Area Under the Curve (AUC) of 0.90 (95% CI [0.883-0.957]), specificity of 0.84, and sensitivity of 0.96 (P = 0.0069; significant at P < 0.05). In comparison, the TPC model achieved an AUC of 0.67 (95% CI [0.647-0.703]), specificity of 0.46, and sensitivity of 0.56 (P = 0.7037; not significant). The LPC model showed intermediate performance, with an AUC of 0.78 (95% CI [0.713-0.751]), specificity of 0.73, and sensitivity of 0.84 (P = 0.0269; significant at P < 0.05). All P-values were derived from DeLong's test comparing AUCs between models, with statistical significance defined as P < 0.05. Of the 1,026 lymph node stations analyzed, 204 showed metastasis, while 822 did not. XGBoost consistently outperformed other models in predicting MLNM. CONCLUSION: Combining PET-CT imaging features of primary tumors and lymph nodes with clinical and pathological data shows promise for accurately predicting MLNM in NSCLC. The TLPC model offers a non-invasive method for identifying lymph node metastasis, supporting personalized treatment strategies. However, since PET-CT was performed selectively rather than routinely acquired, external validation across diverse clinical settings is warranted to confirm model generalizability.