Abstract
BACKGROUND: When developing/validating prognostic models, it is typical to assess calibration between predicted and observed risks - either in the development dataset or in an external sample. For competing risks data, correct specification of more than one model may be required to ensure well-calibrated predicted risks for the event of interest. Furthermore, interest may be in the predicted risks of the event of interest, competing events and all-causes. Therefore, calibration must be assessed simultaneously using various measures. METHODS: We focus on the calibration of prediction models for external validation using a cause-specific hazards approach. We propose that miscalibration for cause-specific hazard models be assessed using components specific to each model through the complement of the cause-specific survival alongside the assessment of the calibration of the cause-specific absolute risks. We simulated a range of scenarios to illustrate how to identify which model(s) are mis-specified in an external validation setting. Calibration plots and calibration statistics (calibration slope, calibration-in-the-large) are presented alongside performance measures such as the Brier score and Index of Prediction Accuracy. We use pseudo-observations to calculate observed risks and generate a smooth calibration curve with restricted cubic splines. We fitted flexible parametric survival models to the simulated data to flexibly estimate baseline cause-specific hazards for the prediction of individual cause-specific absolute risks. RESULTS: Our simulations illustrate that miscalibration due to changes in the baseline cause-specific hazards in external validation data is better identified using components from each cause-specific model. A mis-calibrated model on one cause could lead to poor calibration of the predicted absolute risks for each cause of interest, including the all-cause absolute risk. This is because prediction of a single cause-specific absolute risk is impacted by effects of variables on the cause of interest and competing events. CONCLUSIONS: If accurate predictions for both all-cause and each cause-specific absolute risks are of interest, this is best achieved by developing and validating models via the cause-specific hazards approach. For each cause-specific model, researchers should evaluate calibration plots separately using the complement of the cause-specific survival function to reveal the cause of any miscalibration. However, this also requires careful consideration of dependent censoring which must be sufficiently accounted for.