SLES: A Theoretical Lens on Generative AI Safety: Near and Long Term

SLES:生成式人工智能安全的理论视角:近期和长期

基本信息

  • 批准号:
    2331831
  • 负责人:
  • 金额:
    $ 80万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-11-01 至 2026-10-31
  • 项目状态:
    未结题

项目摘要

Generative AI technologies like ChatGPT have taken the world by storm with their ability to synthesize strikingly coherent text, code, and more. The pace with which these systems continue to improve in quality and increasingly shape diverse facets of society and industry is remarkable, yet the field's proficiency in controlling and ensuring the reliability of these systems has not quite kept up. These models remain notoriously prone to confidently making factually incorrect yet convincing-sounding statements. Even when they in principle have all of the knowledge that they need to prevent this, the models often still stumble in putting the pieces together. As this technology makes its way into mission-critical contexts like healthcare or policy decisions, it is crucial to avoid such failure modes. This research will develop mathematically rigorous AI deployment methods that come with solid theoretical assurances that the systems will not stray from their intended behavior in this way. The findings of this project will be instrumental in establishing sustainable checks and fail safes so that generative AI technologies can scale in a controlled fashion that is aligned with human interests. The research aims to tackle a mixture of both near-term challenges in safety for generative AI as well as emerging, longer-term ones that will arise as these models grow in their capabilities. For the former, the project will establish mathematical parameters for factuality and non-hallucination in generative models. This encompasses detecting instances when models make factual assertions, calibrating confidence scores for these assertions, reliably attributing these assertions to their sources in the training data, and encouraging models to abstain from generation when faced with sufficiently out-of-distribution input. Another goal is investigating methodologies to elicit and edit knowledge stored in generative models, as well as isolating fundamental barriers to doing so based on tools from fine-grained complexity theory and computational notions of entropy. For safety in the longer-term, the project will examine the feasibility of integrating emergency stop functionality into AI systems based on cryptographic backdoors, as well as implementing "AI arms protocols" based on zero knowledge proofs to publicly certify their safety properties while keeping certain components of these systems private. The research will also rigorously stress-test existing proposals for scalable oversight of AI systems, like natural-language debate and iterated amplification, using techniques from combinatorial game theory and average-case analysis of recursive heuristics.This research is supported by a partnership between the National Science Foundation and Open Philanthropy.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
像ChatGPT这样的生成式人工智能技术以其合成惊人连贯的文本、代码等的能力席卷了世界。这些系统的质量不断提高,并日益影响社会和工业的各个方面,但该领域在控制和确保这些系统可靠性方面的熟练程度并没有完全跟上。众所周知,这些模型仍然倾向于自信地做出事实错误但听起来令人信服的陈述。即使这些模型原则上已经掌握了防止这种情况发生所需的所有知识,但它们在拼凑这些信息时仍然常常磕磕绊绊。随着该技术进入关键任务环境(如医疗保健或政策决策),避免此类故障模式至关重要。这项研究将开发数学上严格的人工智能部署方法,这些方法具有坚实的理论保证,即系统不会以这种方式偏离其预期行为。该项目的研究结果将有助于建立可持续的检查和故障保险,以便生成式人工智能技术能够以符合人类利益的可控方式进行扩展。这项研究旨在解决生成式人工智能安全方面的短期挑战,以及随着这些模型能力的增长而出现的新兴长期挑战。对于前者,该项目将在生成模型中建立事实性和非幻觉的数学参数。这包括在模型做出事实断言时检测实例,校准这些断言的置信度分数,可靠地将这些断言归因于训练数据中的来源,并鼓励模型在面对充分超出分布的输入时放弃生成。另一个目标是研究方法来引出和编辑存储在生成模型中的知识,以及基于细粒度复杂性理论和熵的计算概念的工具来隔离基本障碍。为了长期安全,该项目将研究将紧急停止功能集成到基于加密后门的人工智能系统中的可行性,以及实施基于零知识证明的“人工智能武器协议”,以公开认证其安全属性,同时保持这些系统的某些组件的私密性。该研究还将使用组合博弈论和递归启发式的平均案例分析技术,对现有的人工智能系统可扩展监督建议进行严格的压力测试,如自然语言辩论和迭代放大。这项研究得到了美国国家科学基金会和开放慈善机构的合作支持。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sitan Chen其他文献

Provably learning a multi-head attention layer
可证明学习多头注意力层
  • DOI:
    10.48550/arxiv.2402.04084
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sitan Chen;Yuanzhi Li
  • 通讯作者:
    Yuanzhi Li
An optimal tradeoff between entanglement and copy complexity for state tomography
状态断层扫描的纠缠和复制复杂性之间的最佳权衡
Efficient learning of many-body systems
多体系统的高效学习
  • DOI:
    10.1038/s41567-024-02393-4
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    19.6
  • 作者:
    Sitan Chen
  • 通讯作者:
    Sitan Chen
A Hierarchy for Replica Quantum Advantage
复制品量子优势的层次结构
  • DOI:
    10.1145/2746539.2746582
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sitan Chen;Jordan S. Cotler;Hsin;J. Li
  • 通讯作者:
    J. Li
Beyond the low-degree algorithm: mixtures of subcubes and their applications
超越低度算法:子立方体的混合及其应用

Sitan Chen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sitan Chen', 18)}}的其他基金

PostDoctoral Research Fellowship
博士后研究奖学金
  • 批准号:
    2103300
  • 财政年份:
    2021
  • 资助金额:
    $ 80万
  • 项目类别:
    Fellowship Award

相似海外基金

REU Site: REU in Theoretical and Experimental Physics
REU 网站:REU 理论与实验物理
  • 批准号:
    2348872
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Continuing Grant
Electron momentum spectroscopy of radiosensitizers New benchmark data for assessing the theoretical models
放射增敏剂的电子动量谱 用于评估理论模型的新基准数据
  • 批准号:
    EP/Y022297/1
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Research Grant
CREST HBCU-RISE: Advancing Theoretical Artificial Intelligence Infrastructure for Modern Data Science Challenges
CREST HBCU-RISE:推进理论人工智能基础设施应对现代数据科学挑战
  • 批准号:
    2409093
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Continuing Grant
CRII: SHF: Theoretical Foundations of Verifying Function Values and Reducing Annotation Overhead in Automatic Deductive Verification
CRII:SHF:自动演绎验证中验证函数值和减少注释开销的理论基础
  • 批准号:
    2348334
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
CAREER: Gaussian Processes for Scientific Machine Learning: Theoretical Analysis and Computational Algorithms
职业:科学机器学习的高斯过程:理论分析和计算算法
  • 批准号:
    2337678
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Continuing Grant
Labor Market Polarization, Earnings Inequality and Optimal Tax Progressivity: A Theoretical and Empirical Analysis
劳动力市场两极分化、收入不平等和最优税收累进性:理论与实证分析
  • 批准号:
    24K04909
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
The syntax of nominal copular clauses: theoretical and empirical perspectives
名词系动词从句的语法:理论和实证视角
  • 批准号:
    AH/Y007492/1
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Research Grant
CAREER: Theoretical foundations for deep learning and large-scale AI models
职业:深度学习和大规模人工智能模型的理论基础
  • 批准号:
    2339904
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Continuing Grant
CAREER: Theoretical and Computational Advances for Enabling Robust Numerical Guarantees in Linear and Mixed Integer Programming Solvers
职业:在线性和混合整数规划求解器中实现鲁棒数值保证的理论和计算进展
  • 批准号:
    2340527
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Continuing Grant
Theoretical and Experimental Investigation of Photoheterolysis Reactions
光异解反应的理论与实验研究
  • 批准号:
    2349051
  • 财政年份:
    2024
  • 资助金额:
    $ 80万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了