CAREER: Robust, Fair, and Culturally Aware Commonsense Reasoning in Natural Language

职业:用自然语言进行稳健、公平和具有文化意识的常识推理

基本信息

  • 批准号:
    2339746
  • 负责人:
  • 金额:
    $ 59.89万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-05-01 至 2029-04-30
  • 项目状态:
    未结题

项目摘要

Recent advances in artificial intelligence have led to the proliferation of Large Language Models (LLMs). LLMs are models that cane be used for interactions with human users through written language; for example, a user inputs an instruction or question in English to the LLM-based program, and the LLM outputs a response in fluent English. With these linguistic capabilities, LLMs are being developed for use in applications that are both ubiquitous (e.g., internet search, customer support, writing tools) and high-stakes (e.g., mental health care, classroom education, assistive technology for people with disabilities). Despite their growing adoption, many fundamental properties of LLMs aren’t yet well understood, and pressing questions remain about when and whether LLMs can be entrusted with such important tasks. For example, when instructed to make simple predictions about every-day situations, like cooking a meal or riding in a vehicle, LLMs can make strange and surprising errors, exhibiting concerning lapses in basic common sense judgment and reasoning abilities. Additionally, these predictions made by LLMs can reflect social stereotypes and cultural assumptions which, at best, limit the usefulness of the technology for certain populations and, at worst, cause active harm. This project seeks to address unfairness and bias due to stereotyping and cultural context by proposing a generalized framework for defeasible commonsense inference in natural language in which a system compares two similar situations with respect to their support for a given inference. The proposed work aims at developing scientific methods to measure and improve the abilities of LLMs to (1) reason correctly about every-day situations, (2) do so in a manner that is fair and unprejudiced, and (3) adapt these reasoning abilities across specific cultural contexts. By measuring these fundamental capabilities of LLMs, we can better understand and mitigate the risks of applying this technology in high-stakes settings.The three phases of the project focus on the (1) robustness, (2) social fairness, and (3) cultural awareness dimensions of reasoning in LLMs. The project assumes a basic task formulation in which a situation description is provided to an LLM (e.g., “Someone drops a glass”), and the LLM must either evaluate a possible inference, or generate an inference from scratch (“The glass breaks”). In phase 1, methods will be developed to automatically manipulate situation descriptions in order to train and evaluate an LLM’s ability to make nuanced inferences, with the goal of learning to distinguish which factors influence a particular inference and which ones do not (e.g., when trying to predict if a dropped glass is going to break, the thickness of the glass matters but the color of the glass does not.) In phase 2, methods will be developed to automatically test whether LLMs make socially fair inferences, for example via name substitution tests, and to intervene when a proposed output is detected as unfair. In phase 3, survey participants from the U.S. and Ghana will answer multiple stages of questions about every-day situations; the collected data will be used to develop evaluation questions for a case study on the adaptability of LLMs across these two cultural settings. For each phase of the project, the resulting datasets, methods, and scientific findings will be made available to the public.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人工智能的最新进展导致了大型语言模型(LLM)的激增。LLM是可用于通过书面语言与人类用户交互的模型;例如,用户用英语向基于LLM的程序输入指令或问题,LLM用流利的英语输出响应。有了这些语言能力,LLM正在开发用于无处不在的应用程序(例如,因特网搜索、客户支持、书写工具)和高风险(例如,心理保健、课堂教育、残疾人辅助技术)。尽管LLM的采用率越来越高,但LLM的许多基本属性尚未得到很好的理解,关于LLM何时以及是否可以被赋予如此重要的任务的紧迫问题仍然存在。例如,当被指示对日常情况进行简单预测时,如做饭或乘车,LLM可能会犯奇怪和令人惊讶的错误,表现出基本常识判断和推理能力的失误。此外,LLM做出的这些预测可以反映社会刻板印象和文化假设,这些假设充其量会限制该技术对某些人群的有用性,而在最坏的情况下,会造成积极的伤害。该项目旨在解决由于刻板印象和文化背景造成的不公平和偏见,提出了一个通用的框架,用于自然语言中可废除的常识推理,其中系统比较了两种相似的情况,以支持给定的推理。拟议的工作旨在开发科学的方法来衡量和提高法学硕士的能力,以(1)正确地推理日常情况,(2)以公平和公正的方式这样做,(3)在特定的文化背景下适应这些推理能力。通过测量LLM的这些基本能力,我们可以更好地理解和减轻在高风险环境中应用该技术的风险。该项目的三个阶段侧重于LLM中推理的(1)鲁棒性,(2)社会公平性和(3)文化意识维度。该项目假设一个基本的任务制定,其中情况描述提供给LLM(例如,“有人掉了一个玻璃杯”),LLM必须评估一个可能的推断,或者从头开始生成一个推断(“玻璃杯碎了”)。在第一阶段,将开发自动操作情况描述的方法,以训练和评估LLM做出细致入微的推断的能力,目的是学习区分哪些因素影响特定的推断,哪些因素不影响(例如,当试图预测掉落的玻璃是否会破碎时,玻璃的厚度很重要,但玻璃的颜色无关紧要。在第二阶段,将开发方法来自动测试LLM是否做出社会公平的推断,例如通过名称替代测试,并在检测到建议的输出不公平时进行干预。在第三阶段,来自美国和加纳的调查参与者将回答有关日常情况的多个阶段的问题;收集的数据将用于制定评估问题,用于对这两种文化背景下LLM适应性的案例研究。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rachel Rudinger其他文献

Cross-lingual Decompositional Semantic Parsing
跨语言分解语义解析
FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models
FORK:用于探索常识推理模型中的烹饪文化偏差的小型测试集
What do Large Language Models Learn about Scripts?
大型语言模型从脚本中了解什么?
  • DOI:
    10.18653/v1/2022.starsem-1.1
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Abhilasha Sancheti;Rachel Rudinger
  • 通讯作者:
    Rachel Rudinger
Recognition of They/Them as Singular Personal Pronouns in Coreference Resolution
在共指消解中将“they/them”识别为单数人称代词
Metrics matter in community detection
指标在社区检测中很重要
  • DOI:
    10.1007/978-3-030-36687-2_14
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Arya D. McCarthy;Tongfei Chen;Rachel Rudinger;D. Matula
  • 通讯作者:
    D. Matula

Rachel Rudinger的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

供应链管理中的稳健型(Robust)策略分析和稳健型优化(Robust Optimization )方法研究
  • 批准号:
    70601028
  • 批准年份:
    2006
  • 资助金额:
    7.0 万元
  • 项目类别:
    青年科学基金项目
心理紧张和应力影响下Robust语音识别方法研究
  • 批准号:
    60085001
  • 批准年份:
    2000
  • 资助金额:
    14.0 万元
  • 项目类别:
    专项基金项目
ROBUST语音识别方法的研究
  • 批准号:
    69075008
  • 批准年份:
    1990
  • 资助金额:
    3.5 万元
  • 项目类别:
    面上项目
改进型ROBUST序贯检测技术
  • 批准号:
    68671030
  • 批准年份:
    1986
  • 资助金额:
    2.0 万元
  • 项目类别:
    面上项目

相似海外基金

VIPAuto: Robust and Adaptive Visual Perception for Automated Vehicles in Complex Dynamic Scenes
VIPAuto:复杂动态场景中自动驾驶车辆的鲁棒自适应视觉感知
  • 批准号:
    EP/Y015878/1
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Fellowship
CAREER: Game Theoretic Models for Robust Cyber-Physical Interactions: Inference and Design under Uncertainty
职业:稳健的网络物理交互的博弈论模型:不确定性下的推理和设计
  • 批准号:
    2336840
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
  • 批准号:
    2338846
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Continuing Grant
Robust Transient State Estimation for Three-Phase Power Systems
三相电力系统的鲁棒瞬态估计
  • 批准号:
    2330377
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Standard Grant
NSF Convergence Accelerator track L: Translating insect olfaction principles into practical and robust chemical sensing platforms
NSF 融合加速器轨道 L:将昆虫嗅觉原理转化为实用且强大的化学传感平台
  • 批准号:
    2344284
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Standard Grant
Research on Robust Multi-Person Gait Recognition Based on the Combination of Human Mesh Model and Silhouette
基于人体网格模型与剪影相结合的鲁棒多人步态识别研究
  • 批准号:
    24K20794
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
CAREER: Optimal Transport Beyond Probability Measures for Robust Geometric Representation Learning
职业生涯:超越概率测量的最佳传输以实现稳健的几何表示学习
  • 批准号:
    2339898
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Continuing Grant
Collaborative Research: Robust and miniature laser with tailorable single-mode operation range
合作研究:具有可定制单模工作范围的坚固微型激光器
  • 批准号:
    2411394
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Standard Grant
CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
  • 批准号:
    2337375
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Continuing Grant
Sustainable and robust Australian Ni-based superalloy manufacturing
可持续且稳健的澳大利亚镍基高温合金制造
  • 批准号:
    LP230100155
  • 财政年份:
    2024
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Linkage Projects
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了