Abstract
Radiomic data analysis frequently requires identifying major sub-groups within high-dimensional datasets. However, many datasets are limited in size, hindering the effective application of machine learning methods. We adopted a regularized network model coupled with the extended Bayesian information criterion to identify sub-networks and developed a graph network-based k-means clustering algorithm in connection with unbalanced optimal transport to group samples. Survival analysis and CIBERSORT analysis with tumor immune cell abundance were conducted between the identified sub-groups. This approach was applied to two cancer types: head and neck squamous cell carcinoma (HNSCC) and non-small cell lung cancer (NSCLC), using radiomic features extracted from computed tomography (CT) scans and RNA-Seq gene expression profiles. For HNSCC, high and low-risk groups were identified from the largest sub-network using the proposed method. Kaplan-Meier analysis showed a statistically significant difference in progression-free survival between the high and low-risk groups (p = 0.0202). In NSCLC, Kaplan-Meier analysis showed a statistically significant difference in overall survival between the high and low-risk groups identified from the second largest sub-network (p = 0.0007). The NSCLC radiomic data were assessed on the HNSCC network, validating statistical significance with p = 0.0007. In CIBERSORT analysis for HNSCC, neutrophils showed a statistically significant difference between the high and low-risk groups (p = 0.0221). In NSCLC, resting dendritic cells and activated mast cells showed statistically significant differences with p = 0.0126 and 0.0046, respectively. We demonstrated that closely related tumor radiomic characteristics can effectively identify radiophenotypes with distinct prognoses. Our specific findings suggest that these image characteristics may be associated with varying tumor-immune interactions.