CAS: enhancing implicit constrained data augmentation with semantic enrichment for biomedical relation extraction and beyond

CAS:通过语义增强来提升隐式约束数据增强,以用于生物医学关系抽取及其他领域

阅读:1

Abstract

Biomedical relation extraction often involves datasets with implicit constraints, where structural, syntactic, or semantic rules must be strictly preserved to maintain data integrity. Traditional data augmentation techniques struggle in these scenarios, as they risk violating domain-specific constraints. To address these challenges, we propose CAS (Constrained Augmentation and Semantic-Quality), a novel framework designed for constrained datasets. CAS employs large language models to generate diverse data variations while adhering to predefined rules, and it integrates the SemQ Filter. This self-evaluation mechanism ensures the quality and consistency of augmented data by filtering out noisy or semantically incongruent samples. Although CAS is primarily designed for biomedical relation extraction, its versatile design extends its applicability to tasks with implicit constraints, such as code completion, mathematical reasoning, and information retrieval. Through extensive experiments across multiple domains, CAS demonstrates its ability to enhance model performance by maintaining structural fidelity and semantic accuracy in augmented data. These results highlight the potential of CAS not only in advancing biomedical NLP research but also in addressing data augmentation challenges in diverse constrained-task settings within natural language processing. Database URL: https://github.com/ngogiahan149/CAS.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。