Risk stratification for long-term inpatient costs in mental disorders: a dual-track machine learning approach using baseline EHRs and hospitalization trajectories

精神疾病长期住院费用风险分层:基于基线电子病历和住院轨迹的双轨机器学习方法

阅读:2

Abstract

BACKGROUND: Mental disorders (MDs) impose substantial long-term inpatient costs, yet existing prediction models rarely account for dynamic hospitalization trajectories or diagnostic heterogeneity. This study developed and validated a dual-track machine learning framework integrating baseline features with trajectory-derived patterns to predict three-year cumulative hospitalization costs for patients with MDs in China. METHODS: We conducted a retrospective cohort study using electronic health records from 3,396 adults with first admission to a psychiatric hospital (2017–2018) and three‑year follow‑up. State sequence analysis and hierarchical clustering identified distinct hospitalization trajectory patterns. Ten baseline variables available at index admission (Set A) and trajectory cluster membership (Set B) were used to train five regression models with stratified 70:30 split and five‑fold cross‑validation. Performance was evaluated using R², RMSE, and MAE on log‑transformed costs. SHAP (SHapley Additive exPlanations) analysis was applied to interpret the optimal model and examine diagnostic heterogeneity. RESULTS: Four distinct trajectory patterns were identified: low‑frequency short‑stay (64.7%), high‑frequency short‑stay (10.0%), long‑term intermittent (4.8%), and long‑term continuous (20.5%). The gradient boosting machine (GBM) achieved the best test performance using Set A (R² = 0.35), significantly outperforming linear regression (R² = 0.33) and random forest (R² = 0.31). Adding trajectory clusters (Set B) increased R² to 0.71 (ΔR² = 0.36), indicating strong association between long‑term hospitalization patterns and cumulative costs, though this component is only retrospectively explanatory. SHAP identified Payment methods, aCCI, Diagnosis groups, and Age as dominant cost drivers. Model performance was stable for the F2 group (61.8% of cohort) but markedly lower for rare diagnostic subgroups (F0, F1). CONCLUSIONS: Risk stratification for three‑year cumulative hospitalization costs is feasible using only routine baseline information from first admission. The proposed dual‑track framework separates prospective prediction from retrospective explanation, providing a methodologically sound tool for institutional resource planning and high‑risk screening in mental health settings. Future work requires external validation and implementation studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12913-026-14274-y.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。