Signal Fidelity Index-aware calibration for addressing distributional shift in predictive modeling across heterogeneous real-world data

针对异构真实世界数据预测模型中的分布偏移问题，提出了一种基于信号保真度指数的校准方法

阅读：1

作者：Cheng,Jingya,Tian,Jiazi,Spoto,Federica,Azhir,Alaleh,Mork,Daniel,Estiri,Hossein

期刊：	Scientific Reports	影响因子：	3.900
时间：	2025	起止号：	2025 Dec 18;16(1):2807
doi：	10.1038/s41598-025-32656-w

Abstract

Machine learning models trained on real-world data (RWD) often experience performance degradation when deployed across different settings due to distributional shift. However, a fundamental but under-explored factor contributing to this degradation is the decay of diagnostic signals: systematic variability in diagnostic quality and consistency across institutional contexts, which affects the reliability of clinical codes used for model training and prediction. To develop and evaluate a Signal Fidelity Index (SFI) that quantifies diagnostic signal decay at the patient level across diverse clinical conditions, and to assess the effectiveness of SFI-aware calibration in improving model performance compared to established calibration methods, without requiring outcome labels in target domains after initial method development. We developed a comprehensive simulation framework using synthetic patient datasets across six clinically diverse phenotypes: dementia, geriatric bipolar disorder, fibromyalgia, adult ADHD, type 2 diabetes, and hypertension. Each phenotype included independent simulation batches with varying demographic compositions and data quality characteristics. The SFI was constructed from six components: diagnostic specificity, temporal consistency, entropy, contextual concordance, medication alignment, and trajectory stability. We implemented SFI-aware calibration using a multiplicative adjustment formula with phenotype-specific calibration parameters optimized through supervised parameter development, then evaluated performance in label-free deployment across heterogeneous testing datasets. We compared SFI-aware calibration against established baseline calibration methods. SFI-aware calibration significantly improved predictive performance against both uncalibrated predictions and all baseline methods across nearly all six phenotypes (Cohen's d = 0.603-5.002, [Formula: see text]). Performance improvements varied by phenotype complexity, with F1-score gains ranging from +4.2% for fibromyalgia to +34.9% for dementia, and AUC gains ranging from +4.7% to +40.1%. Traditional calibration methods often led to degraded performance, with isotonic regression exhibiting universal failure (AUC values degraded to 0.51-0.56 across all phenotypes) and Platt scaling demonstrating inconsistent, phenotype-dependent effects. Brier score decomposition revealed that SFI-aware calibration improved performance through a dual mechanism: reliability reductions ranging from 11% to 29% and resolution increases ranging from +35% to +238%. Notably, even well-diagnosed conditions with standardized diagnostic criteria (type 2 diabetes, hypertension) showed substantial benefits (+8.0% to +13.3% F1-score, +25.1% to +40.1% AUC), suggesting that diagnostic signal variability affects all EHR-based phenotyping. These findings demonstrate that diagnostic signal decay is a tractable problem that can be systematically addressed through patient-level fidelity-aware calibration strategies. SFI-aware calibration offers a practical approach for enhancing the performance of clinical prediction models across diverse healthcare contexts, requiring supervised parameter development once per phenotype, followed by label-free deployment to unlimited target populations. The method consistently outperforms established calibration techniques while avoiding their tendency to degrade performance, making it particularly suitable for large-scale administrative datasets where outcome labels are unavailable. Multi-phenotype validation confirms generalizability across clinical conditions ranging from complex, under-diagnosed phenotypes to well-diagnosed conditions with standardized criteria.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞