Natural language processing for detecting toxic, abusive, and hateful language online
用于在线检测有毒、辱骂和仇恨语言的自然语言处理
基本信息
- 批准号:RGPIN-2022-04481
- 负责人:
- 金额:$ 4.66万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Digital technologies offer incredible power, from artificial intelligence and virtual assistants to social media and recommendation systems. Deploying such technologies in a manner beneficial to both individuals and society is a pressing challenge. In mainstream and social media, content providers welcome feedback; such feedback, however, may be `toxic': malicious, abusive, or offensive. Toxic comments and posts online are those that intend to cause harm. They may take the form of personal attacks, abuse, harassment, threats, and may include profane, obscene, or derogatory language, with hate speech being the most extreme. In the last few years, I have closely studied online news comments and developed natural language processing (NLP) methods to analyze them. My long-term program of research develops robust methods for text classification in tasks such as sentiment analysis, misinformation detection, and content moderation. In the next few years, my SFU laboratory, the Discourse Processing Lab, will continue to study toxic language online, to develop methods and algorithms to detect toxicity automatically. Our work identifying constructive comments, those that contribute positively to an online discussion, has provided excellent insight for how to automatically classify non-constructive and toxic comments. Current approaches to detecting online toxicity are based either on general text characteristics (word length, text length, capitalization, and punctuation) or on lists of words likely to cause offense. Machine learning approaches (supervised, semi-supervised, or based on neural networks) rely on large annotated datasets, but many studies have shown that such approaches often fail because negativity in language may be wrapped in positive words, through metaphors and other figures of speech. Research, including our own, has found that accurately identifying and filtering toxic content requires a multidisciplinary perspective, drawing on a deep understanding of linguistics and on current methods in NLP and machine learning. To address existing gaps in the automatic detection of toxic comments, in the next five years I plan to: (Objective 1) study how metaphors and other figures of speech well known since antiquity (euphemism, litotes, hyperbole, sarcasm) convey toxic language. I will then develop (Objective 2) a system to detect figures of speech automatically, which I will integrate into (Objective 3) a new content moderation platform. The results of this work will mobilize research among scholars interested in evaluative language and the role of media in public discourse, including linguists, computational linguists, and communication and media researchers. At a time when media organizations, social media platforms, and the public are concerned about online abuse, misinformation, and the role of digital technology in politics and society, this project is timely and will make an important contribution to public discourse.
从人工智能和虚拟助手到社交媒体和推荐系统,数字技术提供了令人难以置信的力量。以对个人和社会都有利的方式部署此类技术是一项紧迫的挑战。在主流媒体和社交媒体中,内容提供商欢迎反馈;然而,此类反馈可能是"有毒的":恶意的、辱骂性的或冒犯性的。 网上有毒评论和帖子是指那些意图造成伤害的评论和帖子。它们可能采取人身攻击、辱骂、骚扰、威胁的形式,并可能包括亵渎、淫秽或贬损性语言,其中仇恨言论最为极端。在过去的几年里,我仔细研究了在线新闻评论,并开发了自然语言处理(NLP)方法来分析它们。我的长期研究计划开发了强大的文本分类方法,用于情感分析、错误信息检测和内容审核等任务。未来几年,我的 SFU 实验室——话语处理实验室,将继续在线研究有毒语言,开发自动检测毒性的方法和算法。我们的工作是识别那些对在线讨论做出积极贡献的建设性评论,为如何自动分类非建设性评论和有毒评论提供了极好的见解。 目前检测在线毒性的方法要么基于一般文本特征(单词长度、文本长度、大写和标点符号),要么基于可能引起冒犯的单词列表。机器学习方法(监督式、半监督式或基于神经网络)依赖于大型注释数据集,但许多研究表明,此类方法常常会失败,因为语言中的消极性可能通过隐喻和其他修辞手法被积极的词语所包裹。研究(包括我们自己的研究)发现,准确识别和过滤有毒内容需要多学科视角,充分利用对语言学以及当前 NLP 和机器学习方法的深入理解。为了解决自动检测有毒评论方面的现有差距,我计划在未来五年内:(目标 1)研究自古以来众所周知的隐喻和其他修辞手法(委婉语、轻描淡写、夸张、讽刺)如何传达有毒语言。然后,我将开发(目标 2)一个自动检测修辞格的系统,并将其集成到(目标 3)一个新的内容审核平台中。 这项工作的结果将动员对评价性语言和媒体在公共话语中的作用感兴趣的学者进行研究,包括语言学家、计算语言学家以及传播和媒体研究人员。在媒体组织、社交媒体平台和公众关注网络滥用、错误信息以及数字技术在政治和社会中的作用之际,这个项目是及时的,将为公众话语做出重要贡献。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Taboada, Maite其他文献
Concession strategies in online newspaper comments
- DOI:
10.1016/j.pragma.2020.12.018 - 发表时间:
2021-01-21 - 期刊:
- 影响因子:1.6
- 作者:
Gomez Gonzalez, Maria de los Angeles;Taboada, Maite - 通讯作者:
Taboada, Maite
Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register
- DOI:
10.1075/rs.19012.ehr - 发表时间:
2020-04-10 - 期刊:
- 影响因子:0
- 作者:
Ehret, Katharina;Taboada, Maite - 通讯作者:
Taboada, Maite
Discourse relations and evaluation
- DOI:
10.3366/cor.2016.0091 - 发表时间:
2016-08-01 - 期刊:
- 影响因子:0.5
- 作者:
Trnavac, Radoslava;Das, Debopam;Taboada, Maite - 通讯作者:
Taboada, Maite
Lexicon-Based Methods for Sentiment Analysis
- DOI:
10.1162/coli_a_00049 - 发表时间:
2011-06-01 - 期刊:
- 影响因子:9.3
- 作者:
Taboada, Maite;Brooke, Julian;Stede, Manfred - 通讯作者:
Stede, Manfred
The interplay of complexity and subjectivity in opinionated discourse
- DOI:
10.1177/1461445620966923 - 发表时间:
2020-11-25 - 期刊:
- 影响因子:1.8
- 作者:
Ehret, Katharina;Taboada, Maite - 通讯作者:
Taboada, Maite
Taboada, Maite的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Taboada, Maite', 18)}}的其他基金
A computational treatment of negation and speculation in natural language
自然语言中否定和推测的计算处理
- 批准号:
RGPIN-2015-05220 - 财政年份:2019
- 资助金额:
$ 4.66万 - 项目类别:
Discovery Grants Program - Individual
A computational treatment of negation and speculation in natural language
自然语言中否定和推测的计算处理
- 批准号:
RGPIN-2015-05220 - 财政年份:2018
- 资助金额:
$ 4.66万 - 项目类别:
Discovery Grants Program - Individual
A computational treatment of negation and speculation in natural language
自然语言中否定和推测的计算处理
- 批准号:
RGPIN-2015-05220 - 财政年份:2017
- 资助金额:
$ 4.66万 - 项目类别:
Discovery Grants Program - Individual
A computational treatment of negation and speculation in natural language
自然语言中否定和推测的计算处理
- 批准号:
RGPIN-2015-05220 - 财政年份:2016
- 资助金额:
$ 4.66万 - 项目类别:
Discovery Grants Program - Individual
A computational treatment of negation and speculation in natural language
自然语言中否定和推测的计算处理
- 批准号:
RGPIN-2015-05220 - 财政年份:2015
- 资助金额:
$ 4.66万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
儿童音乐能力发展对语言与社会认知能力及脑发育的影响
- 批准号:31971003
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
面向英汉双向跨语言图像检索的文本分析关键技术研究
- 批准号:61170095
- 批准年份:2011
- 资助金额:57.0 万元
- 项目类别:面上项目
儿童植入耳蜗后听觉行为与言语发展进程的关联性研究
- 批准号:81170916
- 批准年份:2011
- 资助金额:65.0 万元
- 项目类别:面上项目
基于儿童心理分析的图解式汉语口语自动解析方法研究
- 批准号:60175012
- 批准年份:2001
- 资助金额:18.0 万元
- 项目类别:面上项目
相似海外基金
Navigating Chemical Space with Natural Language Processing and Deep Learning
利用自然语言处理和深度学习驾驭化学空间
- 批准号:
EP/Y004167/1 - 财政年份:2024
- 资助金额:
$ 4.66万 - 项目类别:
Research Grant
REU Site: Recent Advances in Natural Language Processing
REU 网站:自然语言处理的最新进展
- 批准号:
2349452 - 财政年份:2024
- 资助金额:
$ 4.66万 - 项目类别:
Standard Grant
Studies of speech, image and natural language processing for multimodal spoken document retrieval
多模态语音文档检索的语音、图像和自然语言处理研究
- 批准号:
23K11216 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Efficient and Fair Language Modelling for Natural Language Processing, investigating lightweight language modelling approaches and aiming at fairness
自然语言处理的高效公平语言建模,研究轻量级语言建模方法并以公平为目标
- 批准号:
2894795 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Studentship
SBIR Phase I: Sown To Grow - Measuring Growth in Trusting Relationships between Students and Educators with Natural Language Processing and Machine Learning Technologies
SBIR 第一阶段:播种成长 - 使用自然语言处理和机器学习技术衡量学生和教育工作者之间信任关系的增长
- 批准号:
2322340 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
- 批准号:
2329273 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Standard Grant
Using Natural Mouse Movement to Establish a Developmental "Biomarker" for Corticospinal Damage
利用自然小鼠运动建立皮质脊髓损伤的发育“生物标志物”
- 批准号:
10667807 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Harmony AI: Natural Language Processing Enabling Advanced Biomanufacturing
Harmony AI:自然语言处理实现先进生物制造
- 批准号:
10761082 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Collaborative Research: EAGER: Developing and Optimizing Reflection-Informed STEM Learning and Instruction by Integrating Learning Technologies with Natural Language Processing
合作研究:EAGER:通过将学习技术与自然语言处理相结合来开发和优化基于反思的 STEM 学习和教学
- 批准号:
2329274 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Standard Grant
CAREER: Data-driven design of graphene oxide for environmental applications enabled by natural language processing and machine learning techniques
职业:通过自然语言处理和机器学习技术实现氧化石墨烯环境应用的数据驱动设计
- 批准号:
2238415 - 财政年份:2023
- 资助金额:
$ 4.66万 - 项目类别:
Continuing Grant