Automatic summarization model based on clustering algorithm

基于聚类算法的自动摘要模型

阅读:1

Abstract

Extractive document summary is usually seen as a sequence labeling task, which the summary is formulated by sentences from the original document. However, the selected sentences usually are high redundancy in semantic space, so that the composed summary are high semantic redundancy. To alleviate this problem, we propose a model to reduce the semantic redundancy of summary by introducing the cluster algorithm to select difference sentences in semantic space and we improve the base BERT to score sentences. We evaluate our model and perform significance testing using ROUGE on the CNN/DailyMail datasets compare with six baselines, which include two traditional methods and four state-of-art deep learning model. The results validate the effectiveness of our approach, which leverages K-means algorithm to produce more accurate and less repeat sentences in semantic summaries.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。