Enhancing Persian text summarization through a three-phase fine-tuning and reinforcement learning approach with the mT5 transformer model

通过基于 mT5 Transformer 模型的三阶段微调和强化学习方法增强波斯语文本摘要功能

阅读:1

Abstract

In the contemporary era, grappling with the vast expanse of big data presents a formidable obstacle, particularly when it comes to extracting vital information from extensive textual sources. The constant influx of news articles from various agencies necessitates an enormous amount of time to digest comprehensively. A viable solution to address this challenge lies in the realm of automatic text summarization, which is a pivotal and intricate endeavor within the field of natural language processing. Text summarization involves transforming pertinent textual content into a concise format that reduces its word count without compromising its underlying meaning. In recent years, transformers have emerged as a prominent force in the landscape of natural language processing, particularly in the realm of text summarization. This research endeavors to harness the power of transformers by training the mT5-base model on a three-step fine-tuning phase on Persian news articles. Subsequently, reinforcement learning via the PPO algorithm is integrated with the fine-tuned model. Finally, we evaluate the model's performance in summarizing Persian texts, shedding light on its efficacy in addressing the formidable task of distilling meaningful insights from a sea of textual data. Our model has set a new benchmark in the field of Persian text summarization, achieving outstanding ROUGE scores of 53.17 for ROUGE-1, 37.12 for ROUGE-2, and 44.13 for ROUGE-L. These remarkable results reflect a significant advancement in the quality of Persian text summarization, signaling a promising era of more refined and context-aware summaries.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。