Abstract
Entity disambiguation is an important task in natural language processing that addresses situations where entity names mentioned in text may correspond to multiple real-world entities, with the goal of disambiguation by mapping entity names mentioned in text to unique and unambiguous entity identifiers. At present, most researches focus on entity disambiguation based on the concept level, which mainly uses contextual information to achieve entity disambiguation. However, the above methods still have some shortcomings: the word vector representation of the traditional model can not solve the polysemy problem, and the disambiguation terms can not consider the context globally to achieve entity disambiguation. In order to solve the above problems, this paper proposes a short text entity disambiguation method based on BERT model and shortest path algorithm. In this method, BiLSTM-CRF model is used in the ambiguous entity recognition stage, cosine similarity is used to perform optimal segmentation of short texts in the text segmentation stage, BERT pre-trained model is used to construct word vectors in the semantic vector construction stage, and the shortest path algorithm (SPA) is used to screen out the correct meaning of ambiguous entities in the current context in the entity disambiguation stage. The experimental results show that the three evaluation indicators of the proposed method, namely accuracy rate, recall rate and F1 score, have increased by an average of 23.37%, 26.47% and 24.98% respectively compared with the other three entity disambiguation methods. In conclusion, the proposed method can further improve the performance of entity disambiguation.