Abstract
Predicting intensive care unit (ICU) length of stay (LOS) within the first 24 h of admission can improve bed management, staffing and care planning. While prior studies have used structured electronic health record (EHR) data, including demographics, vitals, labs and ICD codes, recent advances in transformer-based language models enable the use of unstructured clinical notes. In this study, we compare structured and unstructured modelling approaches for early ICU LOS prediction within a unified experimental framework. We developed two parallel pipelines using MIMIC-IV data: (1) a structured pipeline using conventional machine learning models (logistic regression, random forest, XGBoost and SVM) trained on day-one EHR features with ICD-derived embeddings and (2) an unstructured pipeline fine-tuning transformers (ClinicalBERT, Bio+ClinicalBERT and BlueBERT) on discharge notes. We evaluated binary classification (short [ ≤ 4 days] vs. long [ > 4 days] stay) and regression of exact LOS in days. Our results show XGBoost with ICD embeddings achieved the best results (AUROC = 0.805) with minimal training time; XGBoost without ICD embeddings still achieved strong performance (AUROC = 0.732), providing a baseline without delayed codes. Transformer models performed comparably (AUROC = 0.766) but required more computation. Overall, both pipelines offer valuable early signals, but differ in efficiency and integration, highlighting trade-offs.