Data Diversity vs. Model Complexity in the Prediction of Pediatric Bipolar Disorder: Evidence from Academic and Community Clinical Samples

数据多样性与模型复杂性在预测儿童双相情感障碍中的作用:来自学术和社区临床样本的证据

阅读:1

Abstract

Pediatric bipolar disorder is challenging to diagnose accurately due to symptom heterogeneity. More standardized and data-driven approaches are needed to enhance diagnostic reliability. We evaluated a clinical decision tool (nomogram), statistical methods (logistic regression, LASSO), machine learning (support vector machine, random forest, k-nearest neighbors, extreme gradient boosting), and deep learning model (multilayer perceptron) for pediatric bipolar disorder prediction across two datasets collected in academic (N=550) and community (N=511) clinical settings. We compared three modeling strategies: cross-dataset validation, cross-dataset with interaction terms, and mixed-dataset. We assessed model performance using discrimination ability, calibration, and predictor importance ranking. In the baseline cross-dataset approach, all models showed good internal discrimination in the academic dataset; but external discrimination in the community dataset substantially declined. Interaction-enhanced models slightly improved internal discrimination but not external performance or calibration. Recalibration prominently improved cross-dataset calibration without compromising discrimination, indicating that transportability problems were largely driven by probability scaling. Models trained on mixed datasets exhibited much stronger external discrimination and calibration. Across models and training strategies, family risk and PGBI-10M were consistently ranked as the most important predictors. Predictive models for pediatric bipolar disorder showed strong internal performance but limited cross-setting generalizability due to dataset shift and miscalibration. Increasing model complexity did not improve external performance, whereas training on pooled data substantially improved both discrimination and calibration. Findings suggest that sampling diversity, rather than model complexity, is more valuable for developing clinically useful and generalizable psychiatric prediction models, underscoring the importance of open and collaborative datasets.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。