Transformer-based structuring of free-text radiology report databases

基于Transformer的自由文本放射学报告数据库结构化

阅读：1

作者：Nowak,S,Biesner,D,Layer,Y C,Theis,M,Schneider,H,Block,W,Wulff,B,Attenberger,U I,Sifa,R,Sprinkart,A M

期刊：	European Radiology	影响因子：	4.700
时间：	2023	起止号：	2023 Jun;33(6):4228-4236
doi：	10.1007/s00330-023-09526-y

Abstract

OBJECTIVES: To provide insights for on-site development of transformer-based structuring of free-text report databases by investigating different labeling and pre-training strategies. METHODS: A total of 93,368 German chest X-ray reports from 20,912 intensive care unit (ICU) patients were included. Two labeling strategies were investigated to tag six findings of the attending radiologist. First, a system based on human-defined rules was applied for annotation of all reports (termed "silver labels"). Second, 18,000 reports were manually annotated in 197 h (termed "gold labels") of which 10% were used for testing. An on-site pre-trained model (T(mlm)) using masked-language modeling (MLM) was compared to a public, medically pre-trained model (T(med)). Both models were fine-tuned on silver labels only, gold labels only, and first with silver and then gold labels (hybrid training) for text classification, using varying numbers (N: 500, 1000, 2000, 3500, 7000, 14,580) of gold labels. Macro-averaged F1-scores (MAF1) in percent were calculated with 95% confidence intervals (CI). RESULTS: T(mlm,gold) (95.5 [94.5-96.3]) showed significantly higher MAF1 than T(med,silver) (75.0 [73.4-76.5]) and T(mlm,silver) (75.2 [73.6-76.7]), but not significantly higher MAF1 than T(med,gold) (94.7 [93.6-95.6]), T(med,hybrid) (94.9 [93.9-95.8]), and T(mlm,hybrid) (95.2 [94.3-96.0]). When using 7000 or less gold-labeled reports, T(mlm,gold) (N: 7000, 94.7 [93.5-95.7]) showed significantly higher MAF1 than T(med,gold) (N: 7000, 91.5 [90.0-92.8]). With at least 2000 gold-labeled reports, utilizing silver labels did not lead to significant improvement of T(mlm,hybrid) (N: 2000, 91.8 [90.4-93.2]) over T(mlm,gold) (N: 2000, 91.4 [89.9-92.8]). CONCLUSIONS: Custom pre-training of transformers and fine-tuning on manual annotations promises to be an efficient strategy to unlock report databases for data-driven medicine. KEY POINTS: • On-site development of natural language processing methods that retrospectively unlock free-text databases of radiology clinics for data-driven medicine is of great interest. • For clinics seeking to develop methods on-site for retrospective structuring of a report database of a certain department, it remains unclear which of previously proposed strategies for labeling reports and pre-training models is the most appropriate in context of, e.g., available annotator time. • Using a custom pre-trained transformer model, along with a little annotation effort, promises to be an efficient way to retrospectively structure radiological databases, even if not millions of reports are available for pre-training.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

组蛋白修饰

炎性小体

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

脂代谢

蛋白质稳态

铁代谢

细胞极性

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

细胞干性

琥珀酰化

CAR-NK

RNA 编辑

冷应激

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

肠肝轴

丙酰化

MAIT 细胞