Abstract
PURPOSE: To guide the preselection of highly repeatable radiomic features (RFs) in downstream analysis without further analysis its repeatability, a detailed radiomic feature robustness databank (RF-RobustDB) was established via image perturbation. METHODS: Data on 1,274 oropharyngeal carcinoma (OPC) patients who had undergone pretreatment computed tomography (CT) imaging, collected from a public dataset. The original images and corresponding masks underwent systematic perturbations to simulate potential variations encountered during CT image rescanning, including translational shifts, rotational changes, random noise additions, and contour modifications. For each radiomic feature (RF), including unfiltered, wavelet-filtered, and Laplacian-of-Gaussian (LoG)-filtered features, we systematically quantified robustness against these perturbations by intraclass correlation coefficients (ICCs). RESULTS: Out of 1395 first- and high-order RFs, 470 demonstrated excellent repeatability, i.e., a mean ICC of greater than 0.9. The use of these preselected highly repeatable RFs in model development improved the mean concordance (C) index in two external validation cohorts and reduced the mean C index gap between the training and external validation cohorts. These results demonstrate that the preselected high repeatable RFs from RF-RobustDB can effectively enhance radiomic model generalizability. CONCLUSIONS: The methodology employed to establish the RF-RobustDB is highly transferable to other tumor sites and different imaging modalities, which will facilitate the creation of RF-RobustDBs to guide the development of universally applicable radiomic models.