SHF: Small: Open-domain, Data-driven Code Synthesis from Natural Language
SHF:小型:开放域、数据驱动的自然语言代码合成
基本信息
- 批准号:1815287
- 负责人:
- 金额:$ 49.97万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-10-01 至 2022-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
One of the major hurdles in programming is turning ideas into code; all programmers, even experts, frequently reach points in a program where they know what they want to do but cannot easily turn it into a concrete implementation. In this case, it is common to turn to the web, e.g. enter a natural language query, search, browse results, copy-and-paste appropriate code, and modify it to the desired shape. However, this process is still time-consuming. This research aims to automate and enhance this process, by creating new data-driven methods for code synthesis from natural language, which allow developers to go directly from natural language description to code. Specifically, this project's goal is to bring code synthesis to the open domain, moving from highly engineered methods that work on only a single programming language or task, to methods that have the flexibility and scalability to answer most of the questions asked by programmers, in many different programming languages. The intellectual merit of this cross-disciplinary project lies in its potential to contribute to software engineering through the examination of developer's interaction with natural language productivity tools, and its potential to contribute to natural language processing through new models to understand procedural texts. This project will have broader impact through the development of tools and data linking together programs and natural language, potential to improve STEM education by lowering the barriers to programming, and training of graduate and undergraduate research assistants who will be able to straddle and act as bridges between the fields of natural language processing and software engineering.There are three technical pillars to the work. First, it will focus on methods to mine data consisting of natural language and corresponding code at scale, necessary for training. The mining will be performed over existing online data sources, such as community question answering sites (Stack Overflow) and open-source software repositories (GitHub), using machine learning models that consider both content matches and available meta-data, and crowd-sourcing-based verification of the extracted data. Second, the project will develop code synthesis methods that have the flexibility to handle the wide variety of expressions expected across a variety of software projects and developer needs. This will be done by developing models using neural networks, which have recently shown impressive ability to interpret a wide variety of expressions in other natural language processing tasks. We will expand these models to condition on project context, which will ensure handling of the various constraints necessary to create well-formed programs and allow for adaptation to project-specific conventions and needs. Third, the project will develop methods for learning and improving the models from developer behavior, by feeding back corrections to the generated code into the system and learning from the differences between the pre- and post-correction code. These methods will all be integrated into developer support tools that can be used in a development environment, or through an online API. The utility of these methods will be examined in both controlled and in-the-wild studies. Controlled studies will examine the subjective accuracy of the mined data and generated code, as well as the effect of the tools on the efficiency and ease of development, for programmers from novice to expert level. This project will also create and release tools for general consumption, solicit feedback from a wide variety of developers, and examine how developers use the proposed tools.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
编程的主要障碍之一是将想法转化为代码;所有的程序员,甚至是专家,都经常在程序中遇到这样的情况:他们知道自己想做什么,但不能轻易地将其转化为具体的实现。在这种情况下,通常会转向网络,例如输入自然语言查询,搜索,浏览结果,复制并粘贴适当的代码,并将其修改为所需的形状。然而,这个过程仍然很耗时。本研究旨在通过创建新的数据驱动方法,从自然语言合成代码,使开发人员能够直接从自然语言描述到代码,从而自动化和增强这一过程。具体来说,这个项目的目标是将代码合成引入开放领域,从只在单一编程语言或任务上工作的高度工程化的方法转向具有灵活性和可伸缩性的方法,以回答程序员在许多不同的编程语言中提出的大多数问题。这个跨学科项目的智力价值在于,它有可能通过检查开发人员与自然语言生产力工具的交互,为软件工程做出贡献,并且它有可能通过新的模型来理解过程文本,为自然语言处理做出贡献。通过开发将程序和自然语言联系在一起的工具和数据,该项目将产生更广泛的影响,有可能通过降低编程障碍来改善STEM教育,并培养研究生和本科生研究助理,他们将能够跨越并充当自然语言处理和软件工程领域之间的桥梁。这项工作有三个技术支柱。首先,它将专注于挖掘由自然语言和相应代码组成的大规模数据的方法,这是训练所必需的。挖掘将在现有的在线数据源上执行,例如社区问答网站(Stack Overflow)和开源软件存储库(GitHub),使用机器学习模型考虑内容匹配和可用的元数据,并对提取的数据进行基于众包的验证。其次,该项目将开发代码合成方法,这些方法具有灵活性,可以处理各种软件项目和开发人员需要的各种表达式。这将通过使用神经网络开发模型来完成,神经网络最近在其他自然语言处理任务中表现出了令人印象深刻的能力,可以解释各种各样的表达式。我们将扩展这些模型以适应项目环境,这将确保处理各种必要的约束,以创建格式良好的程序,并允许适应特定于项目的惯例和需求。第三,该项目将开发从开发人员行为中学习和改进模型的方法,通过将对生成代码的更正反馈到系统中,并从更正前和更正后代码之间的差异中学习。这些方法都将集成到开发人员支持工具中,这些工具可以在开发环境中使用,或者通过在线API使用。这些方法的效用将在对照研究和野外研究中进行检验。受控研究将检查挖掘数据和生成代码的主观准确性,以及工具对开发效率和易用性的影响,适用于从新手到专家级别的程序员。这个项目还将创建和发布一般使用的工具,从各种各样的开发人员那里征求反馈,并检查开发人员如何使用建议的工具。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(12)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation
- DOI:10.18653/v1/2020.acl-main.538
- 发表时间:2020-04
- 期刊:
- 影响因子:0
- 作者:Frank F. Xu;Zhengbao Jiang;Pengcheng Yin;Bogdan Vasilescu;Graham Neubig
- 通讯作者:Frank F. Xu;Zhengbao Jiang;Pengcheng Yin;Bogdan Vasilescu;Graham Neubig
How Can We Know What Language Models Know?
- DOI:10.1162/tacl_a_00324
- 发表时间:2020-01-01
- 期刊:
- 影响因子:10.9
- 作者:Jiang, Zhengbao;Xu, Frank F.;Neubig, Graham
- 通讯作者:Neubig, Graham
Learning Structural Edits via Incremental Tree Transformations
- DOI:
- 发表时间:2021-01
- 期刊:
- 影响因子:0
- 作者:Ziyu Yao;Frank F. Xu;Pengcheng Yin;Huan Sun;Graham Neubig
- 通讯作者:Ziyu Yao;Frank F. Xu;Pengcheng Yin;Huan Sun;Graham Neubig
Retrieval-Based Neural Code Generation
- DOI:10.18653/v1/d18-1111
- 发表时间:2018-08
- 期刊:
- 影响因子:0
- 作者:Shirley Anugrah Hayati;R. Olivier;Pravalika Avvaru;Pengcheng Yin;A. Tomasic;Graham Neubig
- 通讯作者:Shirley Anugrah Hayati;R. Olivier;Pravalika Avvaru;Pengcheng Yin;A. Tomasic;Graham Neubig
TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation
- DOI:10.18653/v1/d18-2002
- 发表时间:2018-10
- 期刊:
- 影响因子:0
- 作者:Pengcheng Yin;Graham Neubig
- 通讯作者:Pengcheng Yin;Graham Neubig
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Graham Neubig其他文献
Attentive Interaction Model: Modeling Changes in View in Argumentation
注意力交互模型:对论证中观点的变化进行建模
- DOI:
10.18653/v1/n18-1010 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Yohan Jo;Shivani Poddar;Byungsoo Jeon;Qinlan Shen;C. Rosé;Graham Neubig - 通讯作者:
Graham Neubig
Simple , Correct Parallelization for Blocked Gibbs Sampling Graham Neubig November
分块吉布斯采样的简单、正确并行化 Graham Neubig
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Graham Neubig - 通讯作者:
Graham Neubig
Discriminative Language Models as a Tool for Machine Translation Error Analysis
判别性语言模型作为机器翻译错误分析的工具
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura - 通讯作者:
Satoshi Nakamura
関連尺度に基づいた負の相関ルール抽出手法の高機能化
基于相关措施改进负关联规则提取方法的功能
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
Koichi Akabe;Graham Neubig;Sakriani Sakti;Tomoki Toda;Satoshi Nakamura;宮城 智輝,山本 泰生,岩沼 宏治;Graham Neubig;黒岩 健歩,岩沼 宏治,山本 泰生 - 通讯作者:
黒岩 健歩,岩沼 宏治,山本 泰生
フーリエ変換を用いた命題論理式の充足可能性に関する考察
用傅立叶变换考虑命题逻辑公式的可满足性
- DOI:
- 发表时间:
2013 - 期刊:
- 影响因子:0
- 作者:
赤部 晃一;Graham Neubig;工藤 拓;John Richardson;中澤 敏明;星野 翔;宮城 智輝,山本 泰生,岩沼 宏治 - 通讯作者:
宮城 智輝,山本 泰生,岩沼 宏治
Graham Neubig的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Graham Neubig', 18)}}的其他基金
FAI: Quantifying and Mitigating Disparities in Language Technologies
FAI:量化和减轻语言技术方面的差异
- 批准号:
2040926 - 财政年份:2021
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
Discovering and Demonstrating Linguistic Features for Language Documentation
发现和展示语言文档的语言特征
- 批准号:
1761548 - 财政年份:2018
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
RI: EAGER: Collaborative Research: Adaptive Heads-up Displays for Simultaneous Interpretation
RI:EAGER:协作研究:用于同声传译的自适应平视显示器
- 批准号:
1748642 - 财政年份:2017
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
相似国自然基金
昼夜节律性small RNA在血斑形成时间推断中的法医学应用研究
- 批准号:
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
tRNA-derived small RNA上调YBX1/CCL5通路参与硼替佐米诱导慢性疼痛的机制研究
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
Small RNA调控I-F型CRISPR-Cas适应性免疫性的应答及分子机制
- 批准号:32000033
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
Small RNAs调控解淀粉芽胞杆菌FZB42生防功能的机制研究
- 批准号:31972324
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
变异链球菌small RNAs连接LuxS密度感应与生物膜形成的机制研究
- 批准号:81900988
- 批准年份:2019
- 资助金额:21.0 万元
- 项目类别:青年科学基金项目
肠道细菌关键small RNAs在克罗恩病发生发展中的功能和作用机制
- 批准号:31870821
- 批准年份:2018
- 资助金额:56.0 万元
- 项目类别:面上项目
基于small RNA 测序技术解析鸽分泌鸽乳的分子机制
- 批准号:31802058
- 批准年份:2018
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
Small RNA介导的DNA甲基化调控的水稻草矮病毒致病机制
- 批准号:31772128
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
基于small RNA-seq的针灸治疗桥本甲状腺炎的免疫调控机制研究
- 批准号:81704176
- 批准年份:2017
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
水稻OsSGS3与OsHEN1调控small RNAs合成及其对抗病性的调节
- 批准号:91640114
- 批准年份:2016
- 资助金额:85.0 万元
- 项目类别:重大研究计划
相似海外基金
SaTC: CORE: Small: Systematic Threat Characterization and Prevention in Open-Domain Dialog Systems
SaTC:核心:小型:开放域对话系统中的系统威胁特征描述和预防
- 批准号:
2231002 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
ScAnt - an open-source platform for the creation of 3D models of arthropods (and other small objects)
ScAnt - 用于创建节肢动物(和其他小物体)3D 模型的开源平台
- 批准号:
EP/X032302/1 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Research Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236578 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
FET: Small: An Integrated Framework for the Optimal Control of Open Quantum Systems --- Theory, Quantum Algorithms, and Applications
FET:小型:开放量子系统最优控制的集成框架 --- 理论、量子算法和应用
- 批准号:
2312456 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
Identification and characterization of small open reading frames translated during inflammation
炎症期间翻译的小开放阅读框的识别和表征
- 批准号:
10752246 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236579 - 财政年份:2023
- 资助金额:
$ 49.97万 - 项目类别:
Standard Grant
BindingDB: An Open Knowledgebase of Protein-Small Molecule Interactions
BindingDB:蛋白质-小分子相互作用的开放知识库
- 批准号:
10706457 - 财政年份:2022
- 资助金额:
$ 49.97万 - 项目类别:
Translation of small open reading frames in 3'UTR enhances translation of canonical open reading frames
3UTR 中小型开放阅读框的翻译增强了规范开放阅读框的翻译
- 批准号:
10686152 - 财政年份:2022
- 资助金额:
$ 49.97万 - 项目类别:
Translation of small open reading frames in 3'UTR enhances translation of canonical open reading frames
3UTR 中小型开放阅读框的翻译增强了规范开放阅读框的翻译
- 批准号:
10671093 - 财政年份:2022
- 资助金额:
$ 49.97万 - 项目类别:
BindingDB: An Open Knowledgebase of Protein-Small Molecule Interactions
BindingDB:蛋白质-小分子相互作用的开放知识库
- 批准号:
10331669 - 财政年份:2022
- 资助金额:
$ 49.97万 - 项目类别: