Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation

基于结构和上下文感知推荐的异构数据模式的半自动构建

阅读:1

Abstract

Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines-TF-IDF and BM25 for schemas matching-in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。