EAGER: Building Idiomaticity into Natural Language Processing
EAGER:将惯用性融入自然语言处理
基本信息
- 批准号:2230817
- 负责人:
- 金额:$ 15万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-08-15 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Idiomatic expressions are an essential component of everyday language use and the hallmark of native language ability. Consider the phrase throw away; proficient speakers can effortlessly understand that the phrase takes a figurative meaning in “Britain threw away all the achievements of the last decade.” and a literal sense in “He threw away his cigarette and buried his head in his arms.” This EArly Grant for Exploratory Research (EAGER) will build a high-quality dataset for computers to understand the differences between figurative and literal senses of these expressions in general English text. The main novelty of this project will be in collecting a large class of idiomatic expressions and sentences containing them to let computers learn the inherent variability between a variety of idiomatic phrases. Collecting many sentences with phrases that have a figurative and literal meaning will permit computers better understand the nuances with which these expressions are used in everyday conversations and writing. Beyond understanding them, the collected. examples will help computers use these expressions like native speakers do when automatically writing text and even suggest appropriate expressions in specific contexts.This EAGER project is essentially interdisciplinary spanning the areas of linguistics and computation and will investigate novel paradigms for natural language processing that are idiomaticity-aware. As such, it will have two research aims: (1) creating a high-quality dataset of phrasal verbs annotated with their context-specific senses and their literal/figurative equivalent forms, and (2) testing the performance of state-of-the-art idiomaticity-aware algorithms. Because idiomatic expressions vary widely in form and structure, the focus on phrasal verbs (also known as verb-particle constructions) in the context of the exploratory project will permit studying a very frequent class of idiomatic expressions that are syntactically different from those in currently available datasets. The primary risk of this project stems from its exploratory nature of creating large corpora with sufficient coverage for language model training. Given their prevalence in natural language, the dataset of phrasal verbs in English will supplement available datasets on idiomatic expressions in terms of their variety. Moreover, their figurative and literal ambiguity in context (apart from their polysemy) will permit a diverse look at the phenomenon of non-compositionality that characterizes idiomatic expressions. Thus, the dataset will serve as a training and test bed for algorithms that detect, interpret, and generate a broad class of idiomatic expressions. This effort will lead to new natural language processing algorithms for accurate interpretation and generation of idiomatic expressions towards a more human-like language processing ability in machines.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
习惯用语是日常语言使用的重要组成部分,也是母语能力的标志。想想throw away这个短语;精通英语的人可以毫不费力地理解这个短语在“英国扔掉了过去十年的所有成就”中有比喻意义,在“他扔掉了香烟,把头埋在胳膊里”中有字面意义。这项早期探索性研究拨款(EAGER)将为计算机建立一个高质量的数据集,以理解这些表达在一般英语文本中比喻意义和字面意义之间的差异。这个项目的主要新颖之处在于收集大量的习语表达和包含它们的句子,让计算机学习各种习语短语之间的内在可变性。收集许多具有比喻意义和字面意义的短语,将使计算机更好地理解这些短语在日常对话和写作中使用的细微差别。除了理解他们,收集。示例将帮助计算机像母语人士在自动书写文本时那样使用这些表达,甚至在特定语境中建议适当的表达。这个EAGER项目本质上是跨语言学和计算领域的跨学科项目,将研究具有习惯感知的自然语言处理的新范式。因此,它将有两个研究目标:(1)创建一个高质量的动词短语数据集,其中标注了上下文特定的意义和它们的字面/比喻等效形式;(2)测试最先进的习语感知算法的性能。由于习语表达在形式和结构上有很大的不同,因此在探索性项目的背景下,关注短语动词(也称为动词-粒子结构)将允许研究非常频繁的习语表达类别,这些习语表达在语法上与当前可用数据集中的习语表达不同。这个项目的主要风险源于它的探索性,即创建具有足够覆盖语言模型训练的大型语料库。鉴于它们在自然语言中的普遍存在,英语动词短语数据集将在其多样性方面补充现有的习语表达数据集。此外,它们在上下文中的比喻和字面上的歧义(除了它们的一词多义之外)将允许人们从不同的角度看待习语表达的非组合性现象。因此,该数据集将作为算法的训练和测试平台,用于检测、解释和生成广泛类别的惯用表达式。这一努力将导致新的自然语言处理算法,用于准确解释和生成习惯用语,从而使机器具有更像人类的语言处理能力。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Suma Bhat其他文献
The Relation Among Gender, Language, and Posting Type in Online Chemistry Course Discussion Forums
在线化学课程论坛中性别、语言和发帖类型之间的关系
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Genevieve M. Henricks;Michelle Perry;Suma Bhat - 通讯作者:
Suma Bhat
A Social Network Analysis of Online Engagement for College Students Traditionally Underrepresented in STEM
对传统上在 STEM 中代表性不足的大学生在线参与度的社交网络分析
- DOI:
10.1145/3448139.3448159 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Destiny Williams;R. F. Azevedo;Amos Jeng;Vyom Thakkar;Suma Bhat;Nigel Bosch;M. Perry - 通讯作者:
M. Perry
Study Partners Matter: Impacts on Inclusion and Outcomes
研究合作伙伴很重要:对包容性和结果的影响
- DOI:
10.18260/1-2--37777 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Neha Prabhu;Michelle Perry;Renato Azevedo;Lawrence Angrave;Suma Bhat - 通讯作者:
Suma Bhat
No Context Needed: Contextual Quandary In Idiomatic Reasoning With Pre-Trained Language Models
不需要上下文:使用预训练语言模型进行惯用推理的上下文困境
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
K. Cheng;Suma Bhat - 通讯作者:
Suma Bhat
Comparative evaluation of automated scoring of syntactic competence of non-native speakers
- DOI:
10.1016/j.chb.2017.01.060 - 发表时间:
2017-11-01 - 期刊:
- 影响因子:
- 作者:
Klaus Zechner;Su-Youn Yoon;Suma Bhat;Chee Wee Leong - 通讯作者:
Chee Wee Leong
Suma Bhat的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Suma Bhat', 18)}}的其他基金
EAGER: Collaborative: BystanderBots: Automated Bystander Intervention for Cyberbullying Mitigation
EAGER:协作:BystanderBots:缓解网络欺凌的自动旁观者干预
- 批准号:
1720268 - 财政年份:2017
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
相似国自然基金
基于支链淀粉building blocks构建优质BE突变酶定向修饰淀粉调控机制的研究
- 批准号:31771933
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
相似海外基金
NSF Engines Development Award: Building an sustainable plastics innovation ecosystem in the Midwest (MN, IL)
NSF 引擎发展奖:在中西部(明尼苏达州、伊利诺伊州)建立可持续塑料创新生态系统
- 批准号:
2315247 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Cooperative Agreement
CyberCorps Scholarship for Service: Building Research-minded Cyber Leaders
CyberCorps 服务奖学金:培养具有研究意识的网络领导者
- 批准号:
2336409 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Continuing Grant
RII Track-4:NSF: An Integrated Urban Meteorological and Building Stock Modeling Framework to Enhance City-level Building Energy Use Predictions
RII Track-4:NSF:综合城市气象和建筑群建模框架,以增强城市级建筑能源使用预测
- 批准号:
2327435 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Building Synthetic Biofilm Consortia for Polyfluorinated Chemicals Biodegradation
建立多氟化学品生物降解合成生物膜联盟
- 批准号:
2343831 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
RII Track-1: Interface of Change: Building Collaborations to Assess Harvested and Farmed Marine Species Prioritized by Gulf of Alaska Communities Facing Environmental Shifts
RII Track-1:变革界面:建立合作来评估面临环境变化的阿拉斯加湾社区优先考虑的捕捞和养殖海洋物种
- 批准号:
2344553 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Cooperative Agreement
Building Credentialed Media Technology Pathways for Priority Populations from High School through Community Colleges to Industry (MTP3)
为从高中到社区大学再到工业界的优先人群建立认证媒体技术途径 (MTP3)
- 批准号:
2400610 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Postdoctoral Fellowship: CREST-PRP: Exploring the Impact of Heat-Waves and Nutrients on Bloom-Forming and Habitat-Building Seaweeds Along the South Florida Coast
博士后奖学金:CREST-PRP:探索热浪和营养物质对南佛罗里达海岸海藻形成和栖息地建设的影响
- 批准号:
2401066 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Conference: CRA-E Workshop: Supporting career building, student research experiences, and advancement of teaching track faculty
会议:CRA-E 研讨会:支持职业建设、学生研究经验和教学轨道教师的进步
- 批准号:
2421010 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Planning: FIRE-PLAN: Building Wildland Fire Science Capacity in Alaska Through The University of Alaska Fairbanks Rural Campuses
规划:FIRE-PLAN:通过阿拉斯加大学费尔班克斯乡村校区建设阿拉斯加荒地火灾科学能力
- 批准号:
2333423 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant
Capacity Assessment, Tracking, & Enhancement through Network Analysis: Developing a Tool to Inform Capacity Building Efforts in Complex STEM Education Systems
能力评估、跟踪、
- 批准号:
2315532 - 财政年份:2024
- 资助金额:
$ 15万 - 项目类别:
Standard Grant