Abstract
Background With the shift toward perioperative programmed cell death protein-1 and programmed cell death ligand-1 (PD-L1) immunotherapy in non-small cell lung cancer (NSCLC), there is a need to assess PD-L1 status preoperatively. Identifying patients who may benefit from immunotherapy using CT-based features has been hampered by the lack of independent testing. Purpose To evaluate the performance of published CT models in predicting PD-L1 status in a multi-institutional external test set of patients with NSCLC undergoing surgery. Materials and Methods In this retrospective study, published CT radiomic models predicting PD-L1 expression were identified by literature review spanning January 2017 to July 2023. Models with sufficient reporting quality were recreated for testing, using the features and coefficients that were originally published. Feature standardization parameters were trained in a training set, the publicly available NSCLC Cancer Imaging Archive dataset, containing images collected between April 2008 and September 2012, without label observation, for the sole purpose of test set preprocessing. For comparison, one previously published model was also retrained to predict CD274 expression on the training set. CT model discrimination of PD-L1 tumor proportion score (TPS) at clinical thresholds of at least 1% (TPS(≥1%)) and at least 50% (TPS(≥50%)) was tested in an external test set of patients with stage IIB-IIIB NSCLC from 35 institutions studied between February 2009 and October 2018 using area under the receiver operating characteristic curve (AUC) analysis. Results A total of 319 patients with NSCLC were included in this study (mean age, 69 years ± 8.9 [SD]; 195 male patients). Of the 17 CT radiomic models identified by literature review, only three (18%) could be reconstructed from published information (models 1-3). In the external test set (n = 225), model 3 demonstrated comparable TPS(≥50%) discrimination (AUC, 0.61 [95% CI: 0.49, 0.72]; P = .03 [vs null]) with previously reported performance (AUC, 0.66 [95% CI: 0.58, 0.74]; P < .001 [test vs published]). For model 1, test TPS(≥50%) discrimination (AUC, 0.52 [95% CI: 0.39, 0.65]; P = .37 [vs null]) was lower than published performance (AUC, 0.79 [95% CI: 0.58, 1.00]; P < .001 [test vs published]). Model 2 test TPS(≥1%) discrimination (AUC, 0.57 [95% CI: 0.49, 0.64]; P = .04 [vs null]) was also lower than the published result (AUC, 0.85; P < .001 [test vs published]). The predictions of the CD274 messenger RNA-fitted CT model (model 3a) correlated with PD-L1 TPS (Spearman ρ, 0.20; P = .001), discriminating both TPS(≥1%) (AUC, 0.61 [95% CI: 0.55, 0.69]; P = .001 [vs null]) and TPS(≥50%) thresholds (AUC, 0.66 [95% CI: 0.55, 0.76]; P = .001 [vs null]). Conclusion In independent testing, CT predictive models discriminated PD-L1 expression in patients with resectable NSCLC at clinically relevant thresholds, but predictive performance was lower than initially published. © RSNA, 2026 Supplemental material is available for this article.