Abstract
BACKGROUND: Canceled procedures in pediatric day-case surgery are a serious issue that can disrupt hospital workflow, resource utilization, and patient care. To better analyze this and explore strategies to mitigate such costly inefficiencies, we employed unsupervised machine learning (ML) techniques to analyze data from children undergoing day-case surgery at the Ospedale Pediatrico Bambino Gesù IRCCS between January 2020 and March 2022. METHODS: We analyzed a dataset including 4,417 operated and 183 non-operated patients. The following variables were considered: age, hospitalization type, surgical specialty, procedure type, and reasons for cancelation. Dimensionality reduction was performed using Factor Analysis of Mixed Data (FAMD). Cluster analysis was conducted using the k-means algorithm, while a Random Forest classifier was employed to assess feature importance and enhance model interpretability. RESULTS: The overall cancelation rate was 3.84%. Among operated patients, the median duration of surgery was 22 min (IQR: 12-32). For non-operated patients, the median waiting time from first consultation to the scheduled surgical procedure was 105 days (IQR: 58-194). K-means clustering (k = 3) identified three distinct patient groups, supported by robust clustering metrics (silhouette score: 0.696; Davies-Bouldin index: 0.447; Calinski-Harabasz score: 10,119.546). The Hopkins statistic (0.003) confirmed a strong clustering tendency in the dataset. Cluster 0 (n = 2,010) was mainly characterized by plastic and maxillofacial procedures (n = 765), andrological procedures (n = 533), and pediatric urology surgeries (n = 194), largely performed in an ambulatory setting (n = 1,779). Cluster 1 (n = 1,773) predominantly included andrological procedures (n = 809) and showed the longest median intervention duration (27.01 min). Significant inter-cluster differences were observed for age (Kruskal-Wallis test, p < 0.001 for clusters 0-1 and 1-2), surgical intervention rates (highest in Cluster 2: 97.92%; χ (2) = 10.11, p = 0.0063), and procedure duration (all pairwise Dunn tests p < 0.001). Random Forest analysis identified hospitalization type (feature importance: 0.47) and procedure type (feature importance: 0.45) as the most influential variables contributing to cluster differentiation. CONCLUSION: The ML analysis may suggest targeted strategies for optimizing scheduling and resource allocation, ultimately improving patient care and operational efficiency.