On embedding-based automatic mapping of clinical classification system: handling linguistic variations and granular inconsistencies

基于嵌入的临床分类系统自动映射:处理语言差异和细粒度不一致性

阅读:1

Abstract

OBJECTIVES: Mapping clinical classification systems, such as the International Classification of Diseases (ICD), is essential yet challenging. While the manual mapping method remains labor-intensive and lacks scalability, existing embedding-based automatic mapping methods, particularly those leveraging transformer-based pretrained encoders, encounter 2 persistent challenges: (1) linguistic variation and (2) varying granular details in clinical conditions. MATERIALS AND METHODS: We introduce an automatic mapping method that combines the representational power of pretrained encoders with the reasoning capability of large language models (LLMs). For each ICD code, we generate: (1) hierarchy-augmented (HA) and (2) LLM-generated (LG) descriptions to capture rich semantic nuances, addressing linguistic variation. Furthermore, we introduced a prompting framework (PR) that leverages LLM reasoning to handle granularity mismatches, including source-to-parent mappings. RESULTS: Chapterwise mappings were performed between ICD versions (ICD-9-CM↔ICD-10-CM and ICD-10-AM↔ICD-11) using multiple LLMs. The proposed approach consistently outperformed the baseline across all ICD pairs and chapters. For example, combining HA descriptions with Qwen3-8B-generated descriptions yielded an average top-1 accuracy improvement of 6.5% (0.065) across the mapping cases. A small-scale pilot study further indicated that HA+LG remains effective in more challenging one-to-many mappings. CONCLUSIONS: Our findings demonstrate that integrating the representational power of pretrained encoders with LLM reasoning offers a robust, scalable strategy for automatic ICD mapping.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。