EAGER: Building Language Technologies by Machine Reading Grammars
EAGER:通过机器阅读语法构建语言技术
基本信息
- 批准号:2327143
- 负责人:
- 金额:$ 9.93万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-06-15 至 2024-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Recent years have seen incredible advances in natural language processing (NLP) technologies, which now make it possible to perform numerous tasks through, with, or on language data. However, this progress has been limited to the handful of languages for which abundant data are available, because the neural models that facilitate the recent improvements are particularly data hungry. This work suggests that we should move away from the current data-inefficient learning paradigm, and instead attempt to also model languages by relying on the human mode of describing them: the grammar of each language. Put simply, we will aim to incorporate the grammars of languages, as written by linguists and treated as symbolic knowledge bases, in the process of training neural language models. Specifically, this work will focus on the first step towards this goal, namely extracting the necessary information from grammar descriptions and other linguistic documents. We will explore several alternative modeling approaches, first by relying on retrieval-based models. We will additionally attack the problem through a machine-reading and question-answering framework. Ultimately, the success of these methods will enable the creation of linguistically-informed models, which will in turn facilitate the creation of technologies especially for under-served language communities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
近年来,自然语言处理(NLP)技术取得了令人难以置信的进步,这使得通过语言数据、使用语言数据或在语言数据上执行大量任务成为可能。然而,这一进展仅限于少数几种有丰富数据可用的语言,因为促进最近改进的神经模型特别需要数据。这项工作表明,我们应该摆脱目前数据效率低下的学习范式,转而尝试通过依赖人类描述语言的模式来对语言进行建模:每种语言的语法。简而言之,在训练神经语言模型的过程中,我们的目标是将语言学家编写的并被视为符号知识库的语言语法纳入其中。具体地说,这项工作将侧重于实现这一目标的第一步,即从语法描述和其他语言文档中提取必要的信息。我们将探索几种替代的建模方法,首先是依赖于基于检索的模型。我们还将通过机器阅读和问答框架来解决这个问题。最终,这些方法的成功将使语言信息模型的创建成为可能,这反过来将促进技术的创造,特别是针对服务不足的语言社区。该奖项反映了NSF的法定使命,并通过使用基金会的智力优势和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Antonios Anastasopoulos其他文献
PROBER: A System for Real-time Propaganda Behavior Analytics on Social Media and Web Data Streams
PROBER:社交媒体和网络数据流实时宣传行为分析系统
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Yasas Senarath;Antonios Anastasopoulos;Tonya Thornton;Hemant Purohit - 通讯作者:
Hemant Purohit
Noisy Parallel Data Alignment
嘈杂的并行数据对齐
- DOI:
10.48550/arxiv.2301.09685 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Ruoyu Xie;Antonios Anastasopoulos - 通讯作者:
Antonios Anastasopoulos
Flagging Comprehensibility Issues in Hindi Text with Question Answering
通过问答标记印地语文本中的可理解性问题
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Antonios Anastasopoulos;A. Cattelan;Yi Dou;Marcello Federico;Christian Federman;Dmitriy Genzel;Francisco Guzm'an;Junjie Hu;Sheila Castilho;Stephen Doherty;F. Gaspari;J. Devlin;Ming;Kenton Lee;Natasha Dhawan;I. Subbiah;Benjamin Thompson;Zachary Hildner;Areeba;Eric Prommer;Christian T Sinclair - 通讯作者:
Christian T Sinclair
To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer
标记还是不标记:跨语言迁移文本表示的比较研究
- DOI:
10.48550/arxiv.2310.08078 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Md Mushfiqur Rahman;Fardin Ahsan Sakib;FAHIM FAISAL;Antonios Anastasopoulos - 通讯作者:
Antonios Anastasopoulos
Language and Speech Technology for Central Kurdish Varieties
中部库尔德语品种的语言和语音技术
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sina Ahmadi;Daban Q. Jaff;Md Mahfuz Ibn Alam;Antonios Anastasopoulos - 通讯作者:
Antonios Anastasopoulos
Antonios Anastasopoulos的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Antonios Anastasopoulos', 18)}}的其他基金
CCRI: Planning-C: Facilitating Language Technologies for Crisis Response (LT4CR)
CCRI:Planning-C:促进语言技术应对危机(LT4CR)
- 批准号:
2234895 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
Collaborative Research: Language Documentation with an Artificial Intelligence (AI) Helper
协作研究:使用人工智能 (AI) 助手进行语言文档记录
- 批准号:
2109578 - 财政年份:2021
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
Collaborative Research: RI: Small: NL(V)P: Natural Language (Variety) Processing
合作研究:RI:小型:NL(V)P:自然语言(品种)处理
- 批准号:
2125466 - 财政年份:2021
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
相似国自然基金
基于支链淀粉building blocks构建优质BE突变酶定向修饰淀粉调控机制的研究
- 批准号:31771933
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
相似海外基金
Digital building blocks of elementary school foreign language reading motivation
小学外语阅读动机的数字积木
- 批准号:
23K25344 - 财政年份:2024
- 资助金额:
$ 9.93万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Building AI-Powered Responsible Workforce by Integrating Large Language Models into Computer Science Curriculum
通过将大型语言模型集成到计算机科学课程中,打造人工智能驱动的负责任的劳动力队伍
- 批准号:
2336061 - 财政年份:2024
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
Structure Building in Language Production
语言生成中的结构构建
- 批准号:
2234229 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
CAREER: Building a Model of Instructional Congruence through Exploring the Role of Language in Introductory Undergraduate Engineering Courses
职业:通过探索语言在本科工程入门课程中的作用来建立教学一致性模型
- 批准号:
2237543 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Continuing Grant
CAREER: Building Next-Generation Language Models Based on Retrieval
职业:基于检索构建下一代语言模型
- 批准号:
2239290 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Continuing Grant
Building STEM Skills by Integrating Data Literacy and Text Analytics in English Language Arts
通过在英语语言艺术中整合数据素养和文本分析来培养 STEM 技能
- 批准号:
2241483 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Standard Grant
Digital building blocks of elementary school foreign language reading motivation
小学外语阅读动机的数字积木
- 批准号:
23H00647 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Building a Medical Language Resource Toward Secondary Use of Radiology Reports
建立医学语言资源以实现放射学报告的二次使用
- 批准号:
23K19977 - 财政年份:2023
- 资助金额:
$ 9.93万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Building joint models of language and the 3D world
构建语言和 3D 世界的联合模型
- 批准号:
RGPIN-2020-07196 - 财政年份:2022
- 资助金额:
$ 9.93万 - 项目类别:
Discovery Grants Program - Individual
Building a neurolinguistic corpus of naturalistic conversation to investigate second language grammar
建立自然对话的神经语言语料库来研究第二语言语法
- 批准号:
2203723 - 财政年份:2022
- 资助金额:
$ 9.93万 - 项目类别:
Fellowship Award