Author identification of literary works based on text analysis and deep learning

基于文本分析和深度学习的文学作品作者识别

阅读:1

Abstract

With the development of science, speech, picture, and other analysis, problems have been gradually better solved, but the study of Chinese text has been a complex problem to overcome. Chinese text analysis requires not only statistics but also semantic comprehension analysis. Different text types need other language style feature modeling to obtain good recognition results. In this study, we use the deep learning method to construct an automatic text feature extraction model and classify it with the author as a classification label. This study presents a literature author recognition model based on deep learning, which is mainly divided into three phases: text preprocessing, feature extraction, and classification. Each part consists of several small modules or steps. First, we input the corpus to Word2Vec to generate the new word vector. Then, the improved text feature extractor based on CNN and Attention extracts the text features and uses them as the input of the CNN convolution layer. After convolution, the text is combined with bits to get Window Feature Sequence. It is the text feature vector. Next, based on LSTM and Softmax classification output, Window Feature Sequence is used as the input of LSTM to obtain two one-dimensional vectors spliced by concatenate layer. Finally, the result is classified through the fully connected layer, Batch Normalization layer, and Softmax. The performance of the proposed model in recognizing authors of Chinese literature was evaluated using two datasets. In the research process, the data we collected included works of different forms, such as prose and fiction. The research results show that the proposed model can effectively identify author identity. The classification accuracy of our proposed algorithm is significantly better than that of the benchmark model.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。