权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Learning and Using Community-Driven Natural Language Processing Models

职业：学习和使用社区驱动的自然语言处理模型

基本信息

批准号：
2145357
负责人：
Anthony Rios
金额：
$ 55.16万
依托单位：
University of Texas at San Antonio
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-01 至 2027-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2145357&HistoricalAwards=false
关键词：
CAREER Learning Using Community Driven

项目摘要

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).There is a growing interest in applying Natural Language Processing (NLP) to a wide array of tasks, including, but not limited to, health, online moderation, and education. NLP-related research has generally focused on large models uniformly applied to everyone independent of their writing style and social norms, thus, assuming a one-size-fits-all solution. Nevertheless, NLP-based models do not perform equally for all communities because of different writing styles (e.g., dialects) and choices of topical discussion (e.g., Sports vs. Technology). Furthermore, social norms vary between communities, making the original intended use of some NLP models potentially irrelevant. Hence, applying the same NLP model to everyone may cause harm if communities are not directly considered. Therefore, researchers and practitioners must evaluate NLP models on community data before they deploy them. They must also work with communities to determine whether the technology is sound given the community's social norms and needs. This project will address two critical questions: "How can stakeholders know whether the model will harm specific communities when put into production?" and "What community-specific language patterns cause errors in various NLP models?". By answering these questions, this project intends to develop tools to help communities participate in the technology development process, which will enable them to decide whether a specific technology is relevant to the community or not. Finally, this project will also create standards-based lessons for high school students in San Antonio by training local high-school teachers in community-driven NLP, which will potentially help local students better consider NLP applications in their lives.Overall, this project will provide researchers and NLP practitioners with a better understanding of applying and developing NLP models for small communities rather than focusing on a one-size-fits-all framework. Specifically, to address this goal, this project has three objectives. Objective 1 will identify strategies to find correlations between community-specific language and NLP model performance. This objective will result in a better understanding of the inductive biases of NLP models for community-specific applications. Objective 2 will create a tool that can facilitate participatory NLP design by helping community-specific stakeholders identify potential good and harmful outcomes that may be caused by applying a specific NLP model. More importantly, the objective will help decision-makers decide when NLP should or should not be deployed in their communities. Objective 3 will identify methods to improve NLP models for specific communities. The goal is to identify methods to incorporate a community's unlabeled data into existing labeled NLP datasets to improve community-specific model performance. Finally, the project will impact the broader NLP community via the release of open-source software that implements the tools and techniques this award generates.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该奖项全部或部分由2021年美国救援计划法案（公法117-2）资助。人们越来越有兴趣将自然语言处理（NLP）应用于各种任务，包括但不限于健康，在线审核和教育。NLP相关的研究通常集中在统一应用于每个人的大型模型上，独立于他们的写作风格和社会规范，因此，假设一个一刀切的解决方案。然而，由于不同的写作风格（例如，方言）和主题讨论的选择（例如，体育与技术）。此外，社会规范在不同社区之间存在差异，使得一些NLP模型的原始用途可能无关紧要。因此，如果不直接考虑社区，将相同的NLP模型应用于每个人可能会造成伤害。因此，研究人员和从业人员必须在部署NLP模型之前，在社区数据上对其进行评估。他们还必须与社区合作，根据社区的社会规范和需求，确定技术是否合理。该项目将解决两个关键问题：“利益相关者如何知道该模型在投入生产时是否会伤害特定社区？什么样的社区特定语言模式会导致各种NLP模型中的错误？".通过回答这些问题，该项目打算开发工具，帮助社区参与技术开发进程，使他们能够决定某一特定技术是否与社区相关。最后，该项目还将通过对当地高中教师进行社区驱动的NLP培训，为圣安东尼奥的高中生创建基于标准的课程，这将有助于当地学生在生活中更好地考虑NLP应用。该项目将为研究人员和NLP从业者提供更好的理解，为小社区应用和开发NLP模型，而不是专注于一刀切，所有框架。具体而言，为了实现这一目标，该项目有三个目标。目标1将确定策略，以找到社区特定语言和NLP模型性能之间的相关性。这一目标将导致更好地理解社区特定应用的NLP模型的归纳偏差。目标2将创建一个工具，通过帮助社区特定的利益相关者识别应用特定的NLP模型可能导致的潜在的好的和有害的结果，来促进参与式NLP设计。更重要的是，这一目标将帮助决策者决定何时应该或不应该在他们的社区部署NLP。目标3将确定改进特定社区的NLP模型的方法。目标是确定将社区的未标记数据纳入现有标记NLP数据集的方法，以提高社区特定模型的性能。最后，该项目将通过发布实现该奖项产生的工具和技术的开源软件来影响更广泛的NLP社区。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

DOI：
10.48550/arxiv.2212.12799
发表时间：
2022-12
期刊：
ArXiv
影响因子：
0
作者：
Xingmeng Zhao;A. Niazi;Anthony Rios
通讯作者：
Xingmeng Zhao;A. Niazi;Anthony Rios

A marker-based neural network system for extracting social determinants of health

用于提取健康社会决定因素的基于标记的神经网络系统

DOI：
10.1093/jamia/ocad041
发表时间：
2023
期刊：
Journal of the American Medical Informatics Association
影响因子：
6.4
作者：
Zhao, Xingmeng;Rios, Anthony
通讯作者：
Rios, Anthony

Measuring Geographic Performance Disparities of Offensive Language Classifiers

衡量攻击性语言分类器的地理表现差异

DOI：
发表时间：
2022
期刊：
COLING
影响因子：
0
作者：
Brandon, Lwowski;Rad, Paul;Rios, Anthony
通讯作者：
Rios, Anthony

Linguistic Elements of Engaging Customer Service Discourse on Social Media

在社交媒体上参与客户服务对话的语言元素

DOI：
发表时间：
2022
期刊：
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS
影响因子：
0
作者：
Singh, Sonam;Rios, Anthony
通讯作者：
Rios, Anthony

Extracting Biomedical Entities from Noisy Audio Transcripts

DOI：
10.48550/arxiv.2403.17363
发表时间：
2024-03
期刊：
ArXiv
影响因子：
0
作者：
Nima Ebadi;Kellen Morgan;Adrian Tan;Billy Linares;Sheri Osborn;Emma Majors;Jeremy Davis;Anthony Rios
通讯作者：
Nima Ebadi;Kellen Morgan;Adrian Tan;Billy Linares;Sheri Osborn;Emma Majors;Jeremy Davis;Anthony Rios

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Anthony Rios其他文献

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

揭露数据库漏洞：文本到 SQL 系统中的零知识模式推理攻击

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
DJordje Klisura;Anthony Rios
通讯作者：
Anthony Rios

Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records

DOI：
10.13023/etd.2018.306
发表时间：
2018
期刊：
影响因子：
0
作者：
Anthony Rios
通讯作者：
Anthony Rios

Laparoscopic Left Gastric Artery Aneurysm Resection

腹腔镜胃左动脉瘤切除术

DOI：
10.1055/s-0036-1597717
发表时间：
2016
期刊：
International Journal of Angiology
影响因子：
0.6
作者：
V. Kamath;Anthony Rios;N. Mishra;S. Sathyanarayana;K. Krishnasastry;E. Rubach
通讯作者：
E. Rubach