权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

SENLP - Software Engineering knowledge of NLP models

SENLP - NLP 模型的软件工程知识

基本信息

批准号：
524228075
负责人：
Professor Dr. Steffen Herbold
金额：
--
依托单位：
Lehrstuhl für AI Engineering
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/524228075?language=en
关键词：
SENLP Software Engineering knowledge NLP

项目摘要

The transformer architecture has changed the field of Natural Language Processing (NLP) and paved the way for models such as BERT and GPT. These models have in common that they use transfer learning in the form of pre-training to learn a general representation of language, which can then be fine-tuned or prompted to perform various downstream tasks. While these models achieve remarkable results in a variety of NLP tasks, it is often unclear why they perform well on specific tasks and how well they work in different domains, such as Software Engineering (SE). Within our prior work, we looked at the impact of domain-specific pre-training on NLP tasks within the SE domain and we found that for polysemous words like "bug" (insect vs. defect) or "root" (plant vs. User), the domain-specific pre-training did help with the understanding of the meaning in the SE domain and that this also led to better performance in domain-specific downstream task. Within this project, we want to deepen our understanding of the capability of NLP models to capture concepts from the SE domain, with a focus on SE definitions and commonsense knowledge. We will use the analogy of NLP models as students to understand how they would perform in SE exams. For example, we will test if the NLP models contain accurate SE definitions and terminology: can the NLP models spot the correct definition of a term in a multiple-choice test, can they generate accurate definitions given a prompt, are they able to understand if definitions are synonyms, and can they differentiate between similar concepts with important differences and, given a prompt, even explain the small differences. A known limitation of large language models for the general domain is that they always answer, even if you give them inputs based on wrong assumptions. We will try to understand if we find similar aspects for the SE domain, e.g., by looking at how models react on prompts asking them which tools can be used to execute automated manual tests or what the best object-oriented design patterns for Haskell are. Through our work, we not only try to identify if we get nonsense responses, but also if we can find methods to infer that generated responses are nonsense, as is possible in the general domain. Additionally, we study the above aspects for different types of models: smaller models with an encoder-only transformer architecture (e.g., BERT), larger encoder-only models (e.g., RoBERTa), models with variations of the transformer architecture to allow for longer contexts (e.g., Big Bird), GPT-style decoder-only models (e.g., GPT-NeoX), and encoder-decoder models (e.g., T5). We will consider both SE-specific pre-training, as well as models trained on general domain data. Since some general domain models were already pre-trained with a corpus that included SE data this also allows us to understand if SE knowledge is sufficiently captured if this is only a smaller part of a very large data set.

转换器架构改变了自然语言处理（NLP）领域，并为BERT和GPT等模型铺平了道路。这些模型的共同点是，它们以预训练的形式使用迁移学习来学习语言的一般表示，然后可以对其进行微调或提示以执行各种下游任务。虽然这些模型在各种NLP任务中取得了显著的结果，但通常不清楚为什么它们在特定任务上表现良好，以及它们在不同领域（例如软件工程（SE））中工作得如何。在我们之前的工作中，我们研究了特定领域预训练对SE域中NLP任务的影响，我们发现对于像“bug”（昆虫vs缺陷）或“root”（植物vs用户）这样的多义词，特定领域预训练确实有助于理解SE域中的含义，这也导致了特定领域下游任务的更好表现。在这个项目中，我们希望加深我们对NLP模型从SE领域捕获概念的能力的理解，重点放在SE定义和常识知识上。我们将使用NLP模型作为学生的类比来了解他们在SE考试中的表现。例如，我们将测试NLP模型是否包含准确的SE定义和术语：NLP模型是否能够在多项选择题中发现术语的正确定义，是否能够在提示下生成准确的定义，是否能够理解定义是否为同义词，是否能够区分具有重要差异的相似概念，甚至在提示下解释微小差异。对于一般领域，大型语言模型的一个已知限制是，即使您基于错误的假设向它们提供输入，它们也总是能够回答问题。我们将尝试了解是否在SE领域中发现了类似的方面，例如，通过查看模型对提示的反应，询问它们可以使用哪些工具来执行自动手动测试，或者Haskell的最佳面向对象设计模式是什么。通过我们的工作，我们不仅试图确定我们是否得到了无意义的响应，而且还可以找到方法来推断生成的响应是无意义的，这在一般领域是可能的。此外，我们针对不同类型的模型研究了上述方面：具有纯编码器转换器架构的较小模型（例如，BERT），较大的纯编码器模型（例如，RoBERTa），具有转换器架构变体的模型以允许更长的上下文（例如，Big Bird）， gpt风格的纯解码器模型（例如，GPT-NeoX）和编码器-解码器模型（例如，T5）。我们将考虑se特定的预训练，以及在一般领域数据上训练的模型。由于一些通用领域模型已经用包含SE数据的语料库进行了预训练，这也允许我们了解SE知识是否被充分捕获，如果这只是一个非常大的数据集的一小部分。