Abstract
OBJECTIVE: To develop and validate a robust machine learning (ML) model for the onset of peritoneal dialysis-associated peritonitis (PDAP) within three months using time-updated data from routine electronic health record (EHR). METHODS: A retrospective cohort analysis of 1143 unique continuous ambulatory PD (CAPD) patients generating 25,710 quarterly assessments (patient-semesters) from 2017 to 2025 was randomly divided into training (n=8537 observations), internal validation (n=8538), and temporal validation (n=6635 observations, 2024-2025) sets. Thirty-one EHR variables were processed via low-variance filtering, correlation analysis, and Boruta selection. Nine ML models (including a Stacking ensemble model) were constructed with patient-level stratified 10-fold cross-validation, optimizing for recall to minimize missed diagnoses. The primary outcome was PDAP onset within three months after routine laboratory tests. RESULTS: In internal validation cohort, the stacking model achieved good performance with area under the curve (AUC) of 0.811 (95% CI 0.792-0.830) and the highest recall of 0.794 (95% CI 0.769-0.819). In temporal validation cohort, it maintained robust good classification performance, achieving AUC of 0.795 (95% CI 0.771-0.819) and the highest recall of 0.833 (95% CI 0.792-0.874). The SHapley Additive exPlanation analysis identified several key features, supporting model interpretability and clinical utility for PDAP risk stratification. CONCLUSION: Integrating time-updated EHR data with ML enables robust and clinically actionable PDAP risk stratification, facilitating timely interventions to optimize CAPD patient management and reduce peritonitis-related complications.