Comprehensive dataset of user-submitted articles with ideological and extreme bias from Reddit

来自Reddit的用户提交文章的综合数据集,这些文章带有意识形态和极端偏见。

阅读:1

Abstract

Our study aims to collect data to understand ideological and extreme bias in text articles shared across various online communities, particularly focusing on the language used in subreddits associated with extremism and targeted violence. Initially, we gathered data from related online communities, specifically the r/Liberal and r/Conservative communities on Reddit, utilizing the Reddit Pushshift API to collect URLs shared within these subreddits. Our aim was to gather news, opinion, and feature articles, resulting in a corpus of 226,010 articles. We also curated a balanced subset of 45,108 articles and annotated 4000 articles to validate their relevance, facilitating understanding of language usage within ideological Reddit communities and insights into ideological bias in media content. Expanding beyond binary ideologies, we introduced a new category termed "Restricted" to encompass articles shared in private or banned subreddits. This third category encompasses articles shared in restricted, privatized, quarantined, or banned subreddits characterized by radicalized and extremist ideologies. This expansion yielded a large dataset of 377,144 articles. Additionally, we included articles from subreddits with unspecified ideologies, creating a holdout set of 922,522 articles. In total, our combined dataset of 1.3 million articles collected from 55 different subreddits will assist in examining radicalized communities and providing discourse analysis in associated subreddits, enhancing understanding of the language used in articles shared within radicalized Reddit communities and offering insights into extreme bias in media content. In summary, we collected 1.52 million articles to understand ideological and extreme bias, providing a comprehensive dataset that aids in understanding language usage within text articles posted in ideological and extreme Reddit communities.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。