Signal Fidelity Index-aware calibration for addressing distributional shift in predictive modeling across heterogeneous real-world data

针对异构真实世界数据预测模型中的分布偏移问题,提出了一种基于信号保真度指数的校准方法

阅读:1

Abstract

Machine learning models trained on real-world data (RWD) often experience performance degradation when deployed across different settings due to distributional shift. However, a fundamental but under-explored factor contributing to this degradation is the decay of diagnostic signals: systematic variability in diagnostic quality and consistency across institutional contexts, which affects the reliability of clinical codes used for model training and prediction. To develop and evaluate a Signal Fidelity Index (SFI) that quantifies diagnostic signal decay at the patient level across diverse clinical conditions, and to assess the effectiveness of SFI-aware calibration in improving model performance compared to established calibration methods, without requiring outcome labels in target domains after initial method development. We developed a comprehensive simulation framework using synthetic patient datasets across six clinically diverse phenotypes: dementia, geriatric bipolar disorder, fibromyalgia, adult ADHD, type 2 diabetes, and hypertension. Each phenotype included independent simulation batches with varying demographic compositions and data quality characteristics. The SFI was constructed from six components: diagnostic specificity, temporal consistency, entropy, contextual concordance, medication alignment, and trajectory stability. We implemented SFI-aware calibration using a multiplicative adjustment formula with phenotype-specific calibration parameters optimized through supervised parameter development, then evaluated performance in label-free deployment across heterogeneous testing datasets. We compared SFI-aware calibration against established baseline calibration methods. SFI-aware calibration significantly improved predictive performance against both uncalibrated predictions and all baseline methods across nearly all six phenotypes (Cohen's d = 0.603-5.002, [Formula: see text]). Performance improvements varied by phenotype complexity, with F1-score gains ranging from +4.2% for fibromyalgia to +34.9% for dementia, and AUC gains ranging from +4.7% to +40.1%. Traditional calibration methods often led to degraded performance, with isotonic regression exhibiting universal failure (AUC values degraded to 0.51-0.56 across all phenotypes) and Platt scaling demonstrating inconsistent, phenotype-dependent effects. Brier score decomposition revealed that SFI-aware calibration improved performance through a dual mechanism: reliability reductions ranging from 11% to 29% and resolution increases ranging from +35% to +238%. Notably, even well-diagnosed conditions with standardized diagnostic criteria (type 2 diabetes, hypertension) showed substantial benefits (+8.0% to +13.3% F1-score, +25.1% to +40.1% AUC), suggesting that diagnostic signal variability affects all EHR-based phenotyping. These findings demonstrate that diagnostic signal decay is a tractable problem that can be systematically addressed through patient-level fidelity-aware calibration strategies. SFI-aware calibration offers a practical approach for enhancing the performance of clinical prediction models across diverse healthcare contexts, requiring supervised parameter development once per phenotype, followed by label-free deployment to unlimited target populations. The method consistently outperforms established calibration techniques while avoiding their tendency to degrade performance, making it particularly suitable for large-scale administrative datasets where outcome labels are unavailable. Multi-phenotype validation confirms generalizability across clinical conditions ranging from complex, under-diagnosed phenotypes to well-diagnosed conditions with standardized criteria.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。