A multilingual BERT-based classification of reviews for enhanced visitors' experience analysis

基于BERT的多语言评论分类,用于提升访客体验分析

阅读:2

Abstract

Cultural organizations today can rely on online platforms to study users' opinions and the most discussed topics related to both general and specific cultural offerings. Despite data acquisition tools, managing unstructured databases remains a hurdle. To overcome this, we propose a classification model that transforms unorganized data into a structured thematic database. The specific case pertains to the Italian city of Brescia. We build a language model that classifies online reviews into four semantic areas defined by the key attractions of the city. We fine-tuned the pre-trained Multilingual BERT, XLM-RoBERTa, and AlBERTo models in a multiclassification task, with promising results based on performance metrics (average F1: 0.72, 0.73, 0.7, respectively; average AUC: 0.91, 0.91, 0.92, respectively). Additionally, clusters of reviews have been detected by applying the HDBSCAN algorithm on their vector representations produced by the model. As a transformation of the chi-square statistic, the Keyness statistic has been employed to extract cluster-specific keywords, which have proven to be highly consistent with the characteristics and offerings of the key cultural attractions, further confirming the good performance of the model. Results show that the proposed model can be profitably employed by policymakers and managers of cultural tourism institutions to use textual data to derive relevant insights about visitors' experience at specific attractions of interest.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。