CAREER: Learning and Using Community-Driven Natural Language Processing Models

职业:学习和使用社区驱动的自然语言处理模型

基本信息

  • 批准号:
    2145357
  • 负责人:
  • 金额:
    $ 55.16万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-06-01 至 2027-05-31
  • 项目状态:
    未结题

项目摘要

This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).There is a growing interest in applying Natural Language Processing (NLP) to a wide array of tasks, including, but not limited to, health, online moderation, and education. NLP-related research has generally focused on large models uniformly applied to everyone independent of their writing style and social norms, thus, assuming a one-size-fits-all solution. Nevertheless, NLP-based models do not perform equally for all communities because of different writing styles (e.g., dialects) and choices of topical discussion (e.g., Sports vs. Technology). Furthermore, social norms vary between communities, making the original intended use of some NLP models potentially irrelevant. Hence, applying the same NLP model to everyone may cause harm if communities are not directly considered. Therefore, researchers and practitioners must evaluate NLP models on community data before they deploy them. They must also work with communities to determine whether the technology is sound given the community's social norms and needs. This project will address two critical questions: "How can stakeholders know whether the model will harm specific communities when put into production?" and "What community-specific language patterns cause errors in various NLP models?". By answering these questions, this project intends to develop tools to help communities participate in the technology development process, which will enable them to decide whether a specific technology is relevant to the community or not. Finally, this project will also create standards-based lessons for high school students in San Antonio by training local high-school teachers in community-driven NLP, which will potentially help local students better consider NLP applications in their lives.Overall, this project will provide researchers and NLP practitioners with a better understanding of applying and developing NLP models for small communities rather than focusing on a one-size-fits-all framework. Specifically, to address this goal, this project has three objectives. Objective 1 will identify strategies to find correlations between community-specific language and NLP model performance. This objective will result in a better understanding of the inductive biases of NLP models for community-specific applications. Objective 2 will create a tool that can facilitate participatory NLP design by helping community-specific stakeholders identify potential good and harmful outcomes that may be caused by applying a specific NLP model. More importantly, the objective will help decision-makers decide when NLP should or should not be deployed in their communities. Objective 3 will identify methods to improve NLP models for specific communities. The goal is to identify methods to incorporate a community's unlabeled data into existing labeled NLP datasets to improve community-specific model performance. Finally, the project will impact the broader NLP community via the release of open-source software that implements the tools and techniques this award generates.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项全部或部分由2021年美国救援计划法案(公法117-2)资助。人们越来越有兴趣将自然语言处理(NLP)应用于各种任务,包括但不限于健康,在线审核和教育。NLP相关的研究通常集中在统一应用于每个人的大型模型上,独立于他们的写作风格和社会规范,因此,假设一个一刀切的解决方案。然而,由于不同的写作风格(例如,方言)和主题讨论的选择(例如,体育与技术)。此外,社会规范在不同社区之间存在差异,使得一些NLP模型的原始用途可能无关紧要。因此,如果不直接考虑社区,将相同的NLP模型应用于每个人可能会造成伤害。因此,研究人员和从业人员必须在部署NLP模型之前,在社区数据上对其进行评估。他们还必须与社区合作,根据社区的社会规范和需求,确定技术是否合理。该项目将解决两个关键问题:“利益相关者如何知道该模型在投入生产时是否会伤害特定社区?什么样的社区特定语言模式会导致各种NLP模型中的错误?".通过回答这些问题,该项目打算开发工具,帮助社区参与技术开发进程,使他们能够决定某一特定技术是否与社区相关。最后,该项目还将通过对当地高中教师进行社区驱动的NLP培训,为圣安东尼奥的高中生创建基于标准的课程,这将有助于当地学生在生活中更好地考虑NLP应用。该项目将为研究人员和NLP从业者提供更好的理解,为小社区应用和开发NLP模型,而不是专注于一刀切,所有框架。具体而言,为了实现这一目标,该项目有三个目标。目标1将确定策略,以找到社区特定语言和NLP模型性能之间的相关性。这一目标将导致更好地理解社区特定应用的NLP模型的归纳偏差。目标2将创建一个工具,通过帮助社区特定的利益相关者识别应用特定的NLP模型可能导致的潜在的好的和有害的结果,来促进参与式NLP设计。更重要的是,这一目标将帮助决策者决定何时应该或不应该在他们的社区部署NLP。目标3将确定改进特定社区的NLP模型的方法。目标是确定将社区的未标记数据纳入现有标记NLP数据集的方法,以提高社区特定模型的性能。最后,该项目将通过发布实现该奖项产生的工具和技术的开源软件来影响更广泛的NLP社区。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models
  • DOI:
    10.48550/arxiv.2212.12799
  • 发表时间:
    2022-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xingmeng Zhao;A. Niazi;Anthony Rios
  • 通讯作者:
    Xingmeng Zhao;A. Niazi;Anthony Rios
A marker-based neural network system for extracting social determinants of health
用于提取健康社会决定因素的基于标记的神经网络系统
Measuring Geographic Performance Disparities of Offensive Language Classifiers
衡量攻击性语言分类器的地理表现差异
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Brandon, Lwowski;Rad, Paul;Rios, Anthony
  • 通讯作者:
    Rios, Anthony
Linguistic Elements of Engaging Customer Service Discourse on Social Media
在社交媒体上参与客户服务对话的语言元素
Extracting Biomedical Entities from Noisy Audio Transcripts
  • DOI:
    10.48550/arxiv.2403.17363
  • 发表时间:
    2024-03
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nima Ebadi;Kellen Morgan;Adrian Tan;Billy Linares;Sheri Osborn;Emma Majors;Jeremy Davis;Anthony Rios
  • 通讯作者:
    Nima Ebadi;Kellen Morgan;Adrian Tan;Billy Linares;Sheri Osborn;Emma Majors;Jeremy Davis;Anthony Rios
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Anthony Rios其他文献

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems
揭露数据库漏洞:文本到 SQL 系统中的零知识模式推理攻击
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    DJordje Klisura;Anthony Rios
  • 通讯作者:
    Anthony Rios
Deep Neural Networks for Multi-Label Text Classification: Application to Coding Electronic Medical Records
  • DOI:
    10.13023/etd.2018.306
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anthony Rios
  • 通讯作者:
    Anthony Rios
Laparoscopic Left Gastric Artery Aneurysm Resection
腹腔镜胃左动脉瘤切除术
  • DOI:
    10.1055/s-0036-1597717
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0.6
  • 作者:
    V. Kamath;Anthony Rios;N. Mishra;S. Sathyanarayana;K. Krishnasastry;E. Rubach
  • 通讯作者:
    E. Rubach
An exploratory mixed methods study about teacher candidates’ descriptions of children’s confusion, productive struggle, and mistakes in an elementary mathematics methods course
一项关于教师候选人对初等数学方法课程中儿童的困惑、富有成效的工作和错误的描述的探索性混合方法研究
FuzzE: Fuzzy Fairness Evaluation of Offensive Language Classifiers on African-American English
  • DOI:
    10.1609/aaai.v34i01.5434
  • 发表时间:
    2020-04
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anthony Rios
  • 通讯作者:
    Anthony Rios

Anthony Rios的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Anthony Rios', 18)}}的其他基金

CRII: SCH: A Computational Framework for Fair Public Health-Related Decisions
CRII:SCH:公平公共卫生相关决策的计算框架
  • 批准号:
    1947697
  • 财政年份:
    2020
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Standard Grant

相似国自然基金

Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
  • 批准号:
  • 批准年份:
    2024
  • 资助金额:
    万元
  • 项目类别:
    合作创新研究团队
Understanding structural evolution of galaxies with machine learning
  • 批准号:
    n/a
  • 批准年份:
    2022
  • 资助金额:
    10.0 万元
  • 项目类别:
    省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
  • 批准号:
    62003314
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
  • 批准号:
    61902016
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
  • 批准号:
    61806040
  • 批准年份:
    2018
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
  • 批准号:
    51769027
  • 批准年份:
    2017
  • 资助金额:
    38.0 万元
  • 项目类别:
    地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
  • 批准号:
    61573081
  • 批准年份:
    2015
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
  • 批准号:
    61572533
  • 批准年份:
    2015
  • 资助金额:
    66.0 万元
  • 项目类别:
    面上项目
E-Learning中学习者情感补偿方法的研究
  • 批准号:
    61402392
  • 批准年份:
    2014
  • 资助金额:
    26.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Enhancing Temperature Visualization in Boiling Fluid over Finned Surfaces using Deep Learning-Enhanced Laser-Induced Fluorescence
职业:使用深度学习增强激光诱导荧光增强翅片表面沸腾流体的温度可视化
  • 批准号:
    2337973
  • 财政年份:
    2024
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Using fossil bivalves to study controls on longevity and establish a paleobiological learning ecosystem in southeast Texas
职业:利用化石双壳类研究长寿的控制并在德克萨斯州东南部建立古生物学学习生态系统
  • 批准号:
    2340642
  • 财政年份:
    2024
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: End-to-End Active Region-based Heliospheric Forecasting System Using Multi-spacecraft Data and Machine Learning
职业:使用多航天器数据和机器学习的基于端对端活动区域的日光层预报系统
  • 批准号:
    2240022
  • 财政年份:
    2023
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Using virtual reality to advance research and learning and promote positive skill transfer in complex environments with applications in enhanced flight training
职业:利用虚拟现实推进研究和学习,并在复杂环境中促进积极的技能转移,并应用于增强飞行训练
  • 批准号:
    2237851
  • 财政年份:
    2023
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Demystifying Deep Machine Learning Models using Convex Optimization for Reliable AI
职业:使用凸优化揭开深度机器学习模型的神秘面纱,实现可靠的人工智能
  • 批准号:
    2236829
  • 财政年份:
    2023
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Using Physics-Based Machine Learning to Reconcile the Crack Tip with the Plastic Zone during Fracture of Metals
职业:使用基于物理的机器学习来协调金属断裂过程中的裂纹尖端与塑性区
  • 批准号:
    2237039
  • 财政年份:
    2023
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Standard Grant
CAREER: Transforming Biosensor Reliability using Sensor Time-series Data and Physics-based Machine Learning
职业:使用传感器时间序列数据和基于物理的机器学习改变生物传感器的可靠性
  • 批准号:
    2144310
  • 财政年份:
    2022
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Navigating Thermodynamic Landscapes for Phase Equilibria Predictions using Molecular Modeling and Machine Learning
职业:利用分子建模和机器学习在热力学景观中进行相平衡预测
  • 批准号:
    2143346
  • 财政年份:
    2022
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Redefining Scientific Literacy at the Community Level - Researching Science Learning using a Social Network Approach
职业:在社区层面重新定义科学素养 - 使用社交网络方法研究科学学习
  • 批准号:
    2042142
  • 财政年份:
    2021
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
CAREER: Real-time Control of Cell Differentiation Using Reinforcement Learning
职业:使用强化学习实时控制细胞分化
  • 批准号:
    2042503
  • 财政年份:
    2021
  • 资助金额:
    $ 55.16万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了