Abstract
BACKGROUND: Manual segmentation of gross tumour volumes (GTV) on [(18)F]FDG PET/CT is time-consuming and subject to interobserver variability, limiting its scalability for prognostic modelling in head and neck cancer. We investigated whether deep learning-based PET tumour volumes (AI-PET-GTV) could replace manually defined GTVs in risk prediction models for loco-regional failure (LRF) and distant metastasis (DM). RESULTS: Using competing risk regression, we tested whether AI-PET-GTV was non-inferior to manual GTV in predicting LRF, with the primary outcome being area under the receiver operating characteristic curve (AUC) at 3 years, using a non-inferiority margin of 5 percentage points. AI-PET-GTV achieved a 3-year AUC of 72.9% (95% CI: 67.9–77.9%) compared to 72.8% (95% CI: 67.8–77.9%) for manual GTV (p = 0.02). At 1 year, AUCs were 77.3% (95% CI: 72.2–82.4%) and 76.9% (95% CI: 71.9–82.0%) for AI and manual GTV, respectively (p = 0.02). Similar patterns were observed for DM prediction at 1 and 3 years (all p < 0.01), and Brier scores also favoured AI-PET-GTV at both timepoints (p < 0.02). Stratification based on predicted risk yielded nearly identical cumulative incidence estimates. For example, the 3-year cumulative incidence of LRF in the high-risk group was 38.4% (95% CI: 32.6–44.2%) for both models. CONCLUSIONS: Automated deep learning-based PET tumour volumes are non-inferior to manual GTVs for prognostic modelling of LRF and DM in head and neck cancer. These findings support clinical implementation of AI-derived volumes for reproducible, scalable, and earlier risk stratification in oncology workflows. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13550-026-01377-0.