The plausibility machine commonsense (PMC) dataset: A massively crowdsourced human-annotated dataset for studying plausibility in large language models

合理性机器常识(PMC)数据集:一个大规模众包的人工标注数据集,用于研究大型语言模型中的合理性

阅读:1

Abstract

Commonsense reasoning has emerged as a challenging problem in Artificial Intelligence (AI). However, one area of commonsense reasoning that has not received nearly as much attention in the AI research community is plausibility assessment, which focuses on determining the likelihood of commonsense statements. Human-annotated benchmarks are essential for advancing research in this nascent area, as they enable researchers to develop and evaluate AI models effectively. Because plausibility is a subjective concept, it is important to obtain nuanced annotations, rather than a binary label of 'plausible' or 'implausible'. Furthermore, it is also important to obtain multiple human annotations for a given statement, to ensure validity of the labels. In this data article, we describe the process of re-annotating an existing commonsense plausibility dataset (SemEval-2020 Task 4) using large-scale crowdsourcing on the Amazon Mechanical Turk platform. We obtain 10,000 unique annotations on a corpus of 2000 sentences (five independent annotations per sentence). Based on these labels, each was labelled as plausible, implausible, or ambiguous. Next, we prompted the GPT-3.5 and GPT-4 models developed by OpenAI. Sentences from the human-annotated files were fed into the models using custom prompt templates, and the models' generated labels were used to determine if they were aligned with those output by humans. The PMC-Dataset is meant to serve as a rich resource for analysing and comparing human and machine commonsense reasoning capabilities, specifically on plausibility. Researchers can utilise this dataset to train, fine-tune, and evaluate AI models on plausibility. Applications include: determining the likelihood of everyday events, assessing the realism of hypothetical scenarios, and distinguishing between plausible and implausible statements in commonsense text. Ultimately, we intend for the dataset to support ongoing AI research by offering a robust foundation for developing models that are better aligned with human commonsense reasoning.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。