Abstract
BACKGROUND: The prognosis of hemodialysis patients using arteriovenous fistula is significantly heterogeneous and influenced by various factors, including vascular conditions and underlying diseases. This study aims to reveal patient subgroup characteristics and identify key influencing factors through cluster analysis based on SHAP values. METHODS: A cohort of 974 hemodialysis patients utilizing arteriovenous fistulae was analyzed, with 55 clinical characteristics extracted for examination. Following multiple imputation, standardization, and dimensionality reduction via principal component analysis, the efficacy of K-Means, DBSCAN, and hierarchical clustering algorithms was evaluated using metrics such as the silhouette coefficient and Calinski-Harabasz index. The K-Means algorithm, with K set to 3, was chosen to develop a pseudo target variable. This was subsequently integrated with the XGBoost model, and SHAP value analysis was employed to elucidate feature contributions. RESULTS: The K-Means clustering algorithm demonstrated superior performance, as indicated by a Silhouette Coefficient of 0.05, effectively categorizing patients into three distinct clusters. Cluster 1 is characterized by a hemoglobin concentration range from -2 to 5, with a median of 1 and the highest variability among the clusters. Cluster 2 exhibits a hemoglobin concentration predominantly between -3 and 2, with a median of 0. Cluster 3 shows a hemoglobin concentration distribution akin to Cluster 2, albeit with slightly greater variability in the tails. SHAP analysis identified hemoglobin concentration as the most significant feature, with a SHAP value of 550, indicating that variations in its distribution are the primary drivers of the clustering process. Additionally, age, BMI, total cholesterol, and other features contribute to the clustering outcomes through complex nonlinear interactions. CONCLUSION: Cluster analysis with SHAP values preliminarily identified heterogeneous subgroups in such patients, with hemoglobin concentration potentially a key driver. This approach may aid personalized treatment, but generalizability needs multicenter validation.