Abstract
INTRODUCTION: The prevalence of unstable and incomplete monitoring data significantly complicates syndromic analysis. Many data interpolation methods currently available demonstrate inadequate effectiveness in overcoming this issue. METHODS: To improve the accuracy of interpolation, we propose the integration of the SHapley Additive exPlanation model (SHAP) with the structural equation model (SEM), forming a combined SHAP-SEM approach. A case study is then performed to assess the enhanced performance of this novel model compared to traditional methods. RESULTS: The SHAP-SEM model was utilized to develop an interpolation model employing data from the Chinese respiratory syndrome surveillance database. We executed three distinct experiments to establish the model datasets, comprising a total of 100 replicates. The performance of the model was evaluated using the root mean square error (RMSE), correlation coefficient (r), and F-score. The findings demonstrate that the SHAP-SEM model consistently achieves superior accuracy in data interpolation, which is evident across different seasons and in overall performance. DISCUSSION: We conclude that the SHAP-SEM model demonstrates an exceptional capacity for accurately interpolating volatile and incomplete data. This capability is crucial for developing a comprehensive database that is essential for conducting risk assessments related to syndromes.