A novel corpus-based computing method for handling critical word-ranking issues: An example of COVID-19 research articles

一种用于处理关键词排序问题的新型基于语料库的计算方法:以新冠肺炎研究文章为例

阅读:1

Abstract

A corpus is a massive body of structured textual data that are stored and operated electronically. It usually combines with statistics, machine learning algorithms, or artificial intelligence (AI) technologies to explore the semantic relationship between lexical units, and beneficial when applied to language learning, information processing, translation, and so forth. In the face of a novel disease, like, COVID-19, establishing medical-specific corpus will enhance frontline medical personnel's information acquisition efficiency, guiding them on the right approaches to respond to and prevent the novel disease. To effectively retrieve critical messages from the corpus, appropriately handling word-ranking issues is quite crucial. However, traditional frequency-based approaches may cause bias in handling word-ranking issues because they neither optimize the corpus nor integrally take words' frequency dispersion and concentration criteria into consideration. Thus, this paper develops a novel corpus-based approach that combines a corpus software and Hirsch index (H-index) algorithm to handle the aforementioned issues simultaneously, making word-ranking processes more accurate. This paper compiled 100 COVID-19-related research articles as an empirical example of the target corpus. To verify the proposed approach, this study compared the results of two traditional frequency-based approaches and the proposed approach. The results indicate that the proposed approach can refine corpus and simultaneously compute words' frequency dispersion and concentration criteria in handling word-ranking issues.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。