Abstract
PURPOSE: Robustness evaluation is routinely used in clinics to ensure the intended dose delivery for intensity-modulated proton therapy (IMPT). Various methods have been proposed, but there is no consensus on which method should be adopted in clinical practice. This study examined various methods within the widely used worst-case approach to provide insights into IMPT plan evaluation. MATERIALS AND METHODS: We evaluated the robustness of 20 clinical IMPT plans (10 prostate and 10 head and neck). Five robustness evaluation methods were assessed: error-bar dose distribution (ebDD), root-mean-square error dose (RMSED) distribution, voxel-wise worst-case, physical scenario worst-case, and dose-volume histogram (DVH) band. Correlations between these methods were analyzed. Each method was reviewed for their quantitative and qualitative capabilities to identify potential underdosing or overdosing. RESULTS: Strong correlations were found between ebDD and RMSED, and between voxel-wise worst-case and physical scenario worst-case. The DVH band method provides a straightforward way to assess whether the worst DVH meets plan criteria and to illustrate dose variations but lacks spatial detail to pinpoint areas of potential underdosing or overdosing. The voxel-wise worst-case captures the worst dose distribution across all evaluation metrics, allowing spatial identification of areas of concern within a single distribution. The physical scenario worst-case also pinpoints specific areas of concern but requires individual assessment for each region of interest and evaluation metric, which can be cumbersome. A 3D visualization with ebDD and RMSED highlights regions of dose variation but does not necessarily indicate clinically meaningful impact. CONCLUSION: Different robustness evaluation methods offer different types of information. Our study provides valuable insights to help identify an effective and practical approach for clinical practice. Based on our findings, we propose a potential evaluation strategy: use the DVH band derived from physical uncertainty scenarios to assess whether the worst boundary values meet plan evaluation criteria, and, when concerns arise, apply the voxel-wise worst-case dose distribution to localize areas of potential risk.