权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Exaggeration, cohesion, and fragmentation in on-line forums

在线论坛中的夸大、衔接和碎片化

基本信息

批准号：
EP/T023333/1
负责人：
Janet Pierrehumbert
金额：
$ 77.06万
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FT023333%2F1
关键词：
Exaggeration cohesion fragmentation line forums

项目摘要

On-line forums can support the formation of social communities with shared interests and needs. They can also have a negative side if groups of users support each other in divisive attitudes or false beliefs. The social fragmentation resulting from these so-called echo-chamber effects has been identified as an engine behind the rise of violence and extremism, political gridlock, and decreases in social mobility. This project is motivated by the observation that echo-chamber effects involve a gradual shift from more moderate language to more extreme language. Further, damage repair is difficult when extreme social fragmentation has already occurred. The ability to use patterns in on-line language for early detection of on-line social fragmentation would thus be a major breakthrough in supporting earlier, and more effective, intervention against harmful trends in on-line forums.We have identified two major challenges in creating this capability. First, current NLP methods are poor at understanding expressions whose meaning is a degree on a scale, such as a scale defined on the dimensions of cost, quality, honesty, or performance. For example, "rather racist", "really racist", and "incredibly racist" express different degrees of disapproval, but such differences are not adequately captured by current algorithms. This limitation is central to our problem, because echo-chamber effects often involve incremental exaggerations of factual claims, emotions, or attitudes. The second challenge results from the fact that methods for using linguistic content in the analysis of social behaviour are limited. While much research has uncovered systematic associations between word choices and social groups, very little has addressed relationships between linguistic inferences and social trends. However, tracking the gradual shifts towards semantic extremes in echo-chamber effects requires making certain linguistic inferences. This is because inferring which underlying dimension of meaning is relevant in any specific case critically depends on information about who is talking and what they are talking about. For example, "Liverpool is far better" might to relate a scale of cultural excellence in a discussion amongst music fans, but to a scale of costs amongst people who are discussing housing. A fundamental advance in the methodology for combining linguistic and social information is thus needed to characterise echo-chamber effects on-line and make predictions about risks of future fragmentation. The project is a new collaboration between an experimental and computational linguist (the PI) and an expert in machine learning and social network analysis (the Co-I). Its components integrate the expertise of both collaborators. Advanced text-mining and data analytics will be used to generate the materials for a large-scale and experimentally normed data set of scalar expressions, using archives of the popular on-line forum Reddit. No normed data set of this type exists, and it will provide the training and test materials needed to develop and evaluate new algorithms. Using a modular work plan, the project team will first develop and validate separate algorithms to assess and predict the meanings of scalar expressions, and the level of fragmentation in the social network of Reddit users. These components will then be integrated using advanced graph-based machine learning methods. The primary outcome of the project will be a software package that will facilitate the work of on-line moderators by flagging subReddits or threads that display early stages of echo-chamber effects. The normed data set will also be extremely valuable for improving NLP applications that require nontrivial semantic inference, such as sentiment analysis, chatbots, and question-answering systems. More generally, the project is a demonstration project for advanced methodology in processing linguistic meaning in relation to social relationships and human behaviour.

在线论坛可以支持具有共同兴趣和需求的社会社区的形成。如果一群用户以分裂的态度或错误的信念相互支持，它们也可能有负面的一面。这些所谓的回音室效应造成的社会分裂被认为是暴力和极端主义抬头、政治僵局和社会流动性下降背后的引擎。这个项目的动机是观察到回音室效应涉及到从更温和的语言到更极端的语言的逐渐转变。此外，当极端的社会分裂已经发生时，修复损害是困难的。因此，使用在线语言模式及早发现在线社会分裂的能力将是支持更早、更有效地干预在线论坛有害趋势的重大突破。我们确定了创建这一能力的两个主要挑战。首先，目前的自然语言处理方法在理解意义是程度的表达方面很差，例如在成本、质量、诚实或绩效维度上定义的程度。例如，“相当种族主义”、“非常种族主义”和“令人难以置信的种族主义”表达了不同程度的反对，但当前的算法没有充分捕捉到这种差异。这一局限是我们问题的核心，因为回音室效应通常涉及对事实主张、情感或态度的渐进式夸大。第二个挑战源于这样一个事实，即在分析社会行为时使用语言内容的方法有限。虽然许多研究揭示了词汇选择和社会群体之间的系统联系，但很少有人研究语言推论和社会趋势之间的关系。然而，追踪回音室效应逐渐向语义极端的转变需要做出一定的语言学推断。这是因为推断意义的哪个基本维度在任何特定情况下是相关的，关键取决于关于谁在说话以及他们在说什么的信息。例如，在乐迷之间的讨论中，“利物浦好得多”可能会联系到文化卓越的程度，但在讨论住房的人中，可能会联系到成本的程度。因此，需要在结合语言和社会信息的方法上取得根本性的进步，以在线描述回音室效应，并对未来碎片化的风险进行预测。该项目是一位实验和计算语言学家(PI)和一位机器学习和社会网络分析专家(Co-I)之间的新合作。它的组件集成了两个合作者的专业知识。将使用流行的在线论坛Reddit的档案，利用先进的文本挖掘和数据分析，为标量表达的大规模和实验性标准化数据集生成材料。不存在这种类型的规范化数据集，它将提供开发和评估新算法所需的培训和测试材料。使用模块化的工作计划，项目团队将首先开发和验证单独的算法，以评估和预测标量表达式的含义，以及Reddit用户在社交网络中的碎片化程度。然后，这些组件将使用先进的基于图形的机器学习方法进行集成。该项目的主要成果将是一个软件包，它将通过标记显示回音室效果早期阶段的红点或线程，促进在线主持人的工作。规范化的数据集对于改进需要非平凡语义推理的NLP应用程序也将是非常有价值的，例如情感分析、聊天机器人和问答系统。更广泛地说，该项目是在处理与社会关系和人类行为有关的语言意义方面的先进方法的示范项目。

项目成果

期刊论文数量（10）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

DOI：
10.18653/v1/2020.emnlp-main.316
发表时间：
2020-05
期刊：
影响因子：
0
作者：
Valentin Hofmann;J. Pierrehumbert;Hinrich Schütze
通讯作者：
Valentin Hofmann;J. Pierrehumbert;Hinrich Schütze

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

DOI：
10.18653/v1/2021.acl-long.279
发表时间：
2021-01
期刊：
ArXiv
影响因子：
0
作者：
Valentin Hofmann;J. Pierrehumbert;Hinrich Schütze
通讯作者：
Valentin Hofmann;J. Pierrehumbert;Hinrich Schütze

DRew: Dynamically Rewired Message Passing with Delay

DOI：
10.48550/arxiv.2305.08018
发表时间：
2023-05
期刊：
ArXiv
影响因子：
0
作者：
Benjamin Gutteridge;Xiaowen Dong;Michael M. Bronstein;Francesco Di Giovanni
通讯作者：
Benjamin Gutteridge;Xiaowen Dong;Michael M. Bronstein;Francesco Di Giovanni

Predicting COVID-19 cases using Reddit posts and other online resources

使用 Reddit 帖子和其他在线资源预测 COVID-19 病例