Sentiment analysis of classical Chinese literature: An unsupervised deep learning model with BERT and graph attention networks

中国古典文学情感分析：基于BERT和图注意力网络的无监督深度学习模型

阅读：1

期刊：	PLoS One	影响因子：	2.600
时间：	2025	起止号：	2025;20(9):e0330919
doi：	10.1371/journal.pone.0330919

Abstract

Sentiment analysis has become a transformative technology in various contexts, particularly in Natural Language Processing (NLP), social media analytics, and literary analysis, as it can extract information from a wide range of texts. The advancements in deep learning, particularly with transformer models such as BERT and graph-based models like GATs, have enabled faster progress in analyzing complex language structures. However, the issue lies in incorporating these technologies into classical Chinese literature, which involves delicate syntax, semantics, and emotions that are difficult to harness using traditional methods. The existing methods, which rely on strictly labeled data or unsupervised learning methods that do not effectively manage contextual dependencies, are very limited in analyzing historical or philosophical texts that abound in metaphor and implicit sentiment. To minimize the limitations, this paper proposes an unsupervised deep learning framework that integrates BERT embeddings, sentiment lexicon enrichment, and graph attention networks (GATs) for sentiment analysis in classical Chinese literature. Firstly, the BERT-based model extracts contextualised embeddings from a raw text, providing a deep understanding of semantics. Secondly, embedding includes sentiment-specific data from the NTUSD lexicon, thus injecting it with emotional information. Thirdly, a graph-based formulation is developed, in which words are represented as nodes, and the relations between them are defined using GATs to modify the features of nodes based on their significance in the context. Finally, unsupervised sentiment labelling, or K-Means clustering, is used to classify sentiment. The experimental results demonstrate the proposed model's efficiency - an accuracy of 0.95, precision of 0.97, recall of 0.96, and F1-score of 0.91 in several runs. These results surpass those of the traditional approach, which includes SentiCNN, MLT-ML4, and BERT-LLSTM-DL, which achieve an accuracy score of 0.90 to 0.95. Additionally, the comparison with large-scale foundation models (such as ChatGPT-4o and DeepSeek R1) in zero-shot prompt-based classification further validates the domain-adapted advantage of our model in the classical Chinese text processing. These results demonstrate that the proposed model significantly enhances the handling of the intricate linguistic features and cultural nuances in classical Chinese texts, providing a robust solution for sentiment analysis in low-resource domains.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。

肿瘤免疫

炎症

T细胞

线粒体

凋亡

转录调控

巨噬细胞

自噬

传染病

氧化应激

肠道菌群

磷酸化

血管生成

囊泡

3D/类器官

单细胞

中性粒细胞

外泌体

DNA甲基化

miRNA

药物研究

铁死亡

细胞衰老

乙酰化

缺氧低氧

泛素化

树突状细胞

炎性小体

组蛋白修饰

肿瘤微环境

lncRNA

代谢重编程

焦亡

m6A/m5C/m7G

内质网应激

空间多组学

细胞基因治疗

治疗耐药

相分离

Treg

上皮间质转化

免疫代谢

染色质重塑

脂质过氧化

蛋白质稳态

脂代谢

细胞极性

铁代谢

氨基酸代谢

碱基编辑

cGAS-STING

肠脑轴

蛋白降解

乳酸化

翻译调控

circRNA

piRNA

肿瘤异质性

NK 细胞

氧化脂质

MDSC

NETosis

低氧缺氧

溶酶体功能

琥珀酰化

细胞干性

CAR-NK

冷应激

RNA 编辑

Tfh

巴豆酰化

器官芯片

表观遗传记忆

铜死亡

器官纤维化

线粒体未折叠蛋白反应

空间代谢组

程序性坏死

自噬流

MAIT 细胞

肠肝轴

丙酰化