Abstract
Acute kidney injury is a common and severe complication following total hip arthroplasty, particularly in elderly or high-risk patients with chronic conditions, significantly increasing morbidity and mortality rates. Traditional prediction methods often struggle with the complexity of multidimensional healthcare data. To address this, we developed a machine learning-based prediction model using multidimensional data from 4601 total hip arthroplasty patients, encompassing 16 general variables (e.g., demographic characteristics, surgical duration, and hospital stay) and 53 laboratory indicators (e.g., Cystatin C, D-dimer, and glucose). Feature selection was performed using Random Forest, Lasso regression, and mutual information analysis, with clinically relevant features such as Cystatin C, glucose, and N-terminal proBNP retained to enhance model interpretability and predictive power. To address class imbalance, we applied the Synthetic Minority Over-sampling Technique and Edited Nearest Neighbors. Among multiple models, CatBoost achieved the best performance, with an area under the receiver operating characteristic curve of 0.95 (95% CI 0.93-0.96), an accuracy of 0.88 (95% CI 0.85-0.90), and an F1-score of 0.79 (95% CI 0.75-0.84) in the internal validation set. External validation using an independent hospital dataset (n = 240) further confirmed the model's robustness, with an AUC of 0.65 (95% CI 0.57-0.73). However, the substantial performance decline in external validation underscores the need for cautious interpretation of performance metrics and institution-specific validation prior to clinical deployment. Shapley Additive Explanations analysis identified Cystatin C, surgical duration, and creatinine as key predictors, demonstrating the model's transparency and clinical relevance. A real-time prediction system, developed using the Flask framework, was validated externally, confirming its utility for acute kidney injury risk assessment and personalized postoperative management. These findings highlight the model's potential to improve clinical decision-making and outcomes for high-risk patients undergoing total hip arthroplasty.