权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CRII: SHF: Towards the Construction of a Model for Natural Language and Source Code

CRII：SHF：构建自然语言和源代码模型

基本信息

批准号：
1850412
负责人：
Christian Newman
金额：
$ 17.45万
依托单位：
Rochester Institute of Tech
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-05-01 至 2022-04-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1850412&HistoricalAwards=false
关键词：
CRII SHF Towards Construction Model

项目摘要

Source code is written using a combination of human languages, such as English, and programming languages. Developers use a combination of the rules for human languages and programming languages to understand code. The act of trying to understand code is referred to as program comprehension; it is an activity that precedes all other programming-related activities a developer might undertake when coding. For example, before fixing a bug, a developer needs to understand the code where the bug is present; to add a new software feature, a developer must understand the code which will support the new feature. If a piece of code is highly comprehensible, then developers will have an easier time maintaining, debugging, and adding to it. To support comprehension, research must attempt to formally model how human language describes program behavior. With such a model, source code could be optimized to be maximally understandable by automatically improving, or generating, human language to best describe it. This project aims to build such a model by combining information from natural language part of speech with a model of program behavior to assist, improve and measure comprehension. This project aims to formally model how human language describes source code behavior. This will be achieved by combining a static-analysis-based taxonomy of identifier type categorizations with natural language techniques and identifier definition-use chains. The combination of these three activities allow the model to measure 1) how the type constrains the behavior of an identifier, 2) what role, in English, the words in an identifier correlate to, and 3) what function calls the identifier is used in. These will allow the model to understand how the English of an identifier relates to the usage (function calls) and behavior constraints (type constraints). The goal of this model is to formally measure the way human languages are used to describe source code behavior such that it could be used to train a machine to do the same. The completed model will increase the current understanding of how developers express program behavior through human languages and allow for this expression to be measurably optimized for increased comprehensibility. Additionally, the model will improve modern program comprehension techniques by allowing them to be more aware of how the underlying source code structure and rules influence the way human languages are used to describe program behavior.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

源代码是使用人类语言（如英语）和编程语言的组合编写的。开发人员使用人类语言和编程语言的规则组合来理解代码。试图理解代码的行为被称为程序理解;它是一种先于开发人员在编码时可能进行的所有其他编程相关活动的活动。例如，在修复bug之前，开发人员需要了解bug所在的代码;要添加新的软件功能，开发人员必须了解将支持新功能的代码。如果一段代码是高度可理解的，那么开发人员将更容易维护，调试和添加它。为了支持理解，研究必须尝试正式建模人类语言如何描述程序行为。通过这种模型，可以通过自动改进或生成人类语言来最好地描述源代码，从而优化源代码，使其最大限度地易于理解。本项目旨在通过将自然语言词性的信息与程序行为模型相结合来构建这种模型，以帮助，改善和测量理解。该项目旨在正式建模人类语言如何描述源代码行为。这将通过将基于静态分析的标识符类型分类与自然语言技术和标识符定义使用链相结合来实现。这三个活动的组合允许模型测量1）类型如何约束标识符的行为，2）在英语中，标识符中的单词与什么角色相关，以及3）标识符用于什么函数调用。这些将允许模型理解标识符的英语如何与用法（函数调用）和行为约束（类型约束）相关。该模型的目标是正式衡量人类语言用于描述源代码行为的方式，以便它可以用来训练机器做同样的事情。完成的模型将增加当前对开发人员如何通过人类语言表达程序行为的理解，并允许对该表达进行可测量的优化，以提高可理解性。此外，该模型将通过允许他们更清楚地了解底层源代码结构和规则如何影响人类语言用于描述程序行为的方式来改进现代程序理解技术。该奖项反映了NSF的法定使命，并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

An Open Dataset of Abbreviations and Expansions

缩写和扩展的开放数据集

DOI：
10.1109/icsme.2019.00041
发表时间：
2019
期刊：
International Conference on Software Maintenance and Evolution
影响因子：
0
作者：
Newman, Christian;Decker, Michael John;AlSuhaibani, Reem S;Peruma, Anthony;Kaushik, Dishant;Hill, Emily
通讯作者：
Hill, Emily

IDEAL: An Open-Source Identifier Name Appraisal Tool

IDEAL：开源标识符名称评估工具

DOI：
10.1109/icsme52107.2021.00064
发表时间：
2021
期刊：
IEEE International Conference on Software Maintenance and Evolution (ICSME
影响因子：
0
作者：
Peruma, Anthony;Arnaoudova, Venera;Newman, Christian D.
通讯作者：
Newman, Christian D.

On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers

DOI：
10.1016/j.jss.2020.110740
发表时间：
2020-07
期刊：
J. Syst. Softw.
影响因子：
0
作者：
Christian D. Newman;Reem S. Alsuhaibani;M. J. Decker;Anthony S Peruma;D. Kaushik;Mohamed Wiem Mkaouer
通讯作者：
Christian D. Newman;Reem S. Alsuhaibani;M. J. Decker;Anthony S Peruma;D. Kaushik;Mohamed Wiem Mkaouer

Modeling the Relationship Between Identifier Name and Behavior

对标识符名称和行为之间的关系进行建模