A short text entity disambiguation method based on BERT model and shortest path algorithm

一种基于BERT模型和最短路径算法的短文本实体消歧方法

阅读:2

Abstract

Entity disambiguation is an important task in natural language processing that addresses situations where entity names mentioned in text may correspond to multiple real-world entities, with the goal of disambiguation by mapping entity names mentioned in text to unique and unambiguous entity identifiers. At present, most researches focus on entity disambiguation based on the concept level, which mainly uses contextual information to achieve entity disambiguation. However, the above methods still have some shortcomings: the word vector representation of the traditional model can not solve the polysemy problem, and the disambiguation terms can not consider the context globally to achieve entity disambiguation. In order to solve the above problems, this paper proposes a short text entity disambiguation method based on BERT model and shortest path algorithm. In this method, BiLSTM-CRF model is used in the ambiguous entity recognition stage, cosine similarity is used to perform optimal segmentation of short texts in the text segmentation stage, BERT pre-trained model is used to construct word vectors in the semantic vector construction stage, and the shortest path algorithm (SPA) is used to screen out the correct meaning of ambiguous entities in the current context in the entity disambiguation stage. The experimental results show that the three evaluation indicators of the proposed method, namely accuracy rate, recall rate and F1 score, have increased by an average of 23.37%, 26.47% and 24.98% respectively compared with the other three entity disambiguation methods. In conclusion, the proposed method can further improve the performance of entity disambiguation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。