Abstract
Named entity recognition (NER) is the basic task of constructing a high-quality knowledge graph, which can provide reliable knowledge in the auxiliary diagnosis of dairy cow disease, thus alleviating problems of missed diagnosis and misdiagnosis due to the lack of professional veterinarians in China. Targeting the characteristics of the Chinese dairy cow diseases corpus, we propose an ensemble Chinese NER model incorporating character-level, pinyin-level, glyph-level, and lexical-level features of Chinese characters. These multi-level features were concatenated and fed into the bidirectional long short-term memory (Bi-LSTM) network based on the multi-head self-attention mechanism to learn long-distance dependencies while focusing on important features. Finally, the globally optimal label sequence was obtained by the conditional random field (CRF) model. Experimental results showed that our proposed model outperformed baselines and related works with an F1 score of 92.18%, which is suitable and effective for named entity recognition for the dairy cow disease corpus.