Identifying communicative functions in discourse with content types

识别语篇中与内容类型相关的交际功能

阅读:1

Abstract

Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen's kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。