Abstract
BACKGROUND: In the field of internet-based healthcare, the complexity of pathology features across various disciplines, coupled with the lack of medical training among most patients, results in medical named entities in doctor patient dialogue texts exhibiting long and multiword syntactic patterns, posing new challenges to named-entity recognition algorithms. METHODS: To address the issue mentioned above, in this study we integrate Convolutional Neural Networks (CNNs) of different dilation rates on top of Flat-Lattice architecture to construct the Flat-Lattice-CNN model. This model not only considers the semantic information of characters and words, as well as their absolute and relative positional information, but also extracts multiple-token co-occurrence relationship features among characters/words spanning different distances to improve the recognition accuracy of long medical-named entities. RESULTS: Experimental results show an improved performance in the task of recognizing medical-named entities on all evaluation datasets, especially on CTDD with a 2.3% increase in F1 score. The proposed Flat-Lattice-CNN model effectively addresses the challenges posed by long and multiword syntactic patterns in medical-named entities, offering improved recognition accuracy and demonstrating the potential for enhancing medical-named-entity recognition in internet-based healthcare dialogues.