RI: Medium: Broad-Coverage Semantic Parsing: Linguistic Representation Learning from Crowd-Scale Data
RI:中:广泛覆盖的语义解析:从人群规模数据中学习语言表示
基本信息
- 批准号:1562364
- 负责人:
- 金额:$ 100.6万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-09-01 至 2021-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Automated understanding of text is a capability that will advance a wide range of language technologies, including information extraction, question answering, opinion analysis, and translation between languages. Such technologies have been in demand in the intelligence and defense communities for many years, and they now underlie many commercially available information-management tools. This project develops robust algorithms that understand natural language expressions by mapping them to formal representations of their meaning, a technique known as semantic parsing. For semantic parsing to be employed in technologies like those listed above, it needs to overcome the fundamental challenge of broad coverage, the ability to handle any text input, in multiple languages. This project meets this challenge by creating new methods for gathering large repositories of semantically annotated data at greatly reduced cost; these are then used to train much more accurate broad-coverage parsing models. The results of this project include open-source implementations, high-quality annotated corpora on an unprecedented scale, and reusable distributed semantic representations for use by the community of natural language processing researchers and practitioners. The goal of broad-coverage semantic parsing can only be achieved by simultaneously focusing on new, large scale sources of data with semantically meaningful annotations and new learning algorithms for inducing models with the representational capacity to make full use of such data. For scalable data collection, this project introduces new techniques that rely on two key complementary insights: (1) any reader who understands a text can answer questions about it, and (2) questions can be constructed whose answers probe any aspect of semantics that need to be recovered. These observations allow designing new data collection techniques that reduce the burden of semantic annotation by providing simple questions and answers about texts. This QA-style annotation can be done for any text in any language, given only native speakers, bypassing the significant effort that currently goes into defining detailed annotation standards. It also allows gathering new datasets on a much larger scale, and for more diverse text types, than ever before. In addition, the project develops new representation learning techniques that tie together a wide range of semantic annotation styles, including the new crowdsourced ones, in a multitask learning setup. Continuous representations (e.g., of word types) provide a powerful way to allow sharing of statistical strength across a large vocabulary, many of whose elements are sparsely observed. While past work has emphasized learning word embeddings, this project employs a shared continuous space ("framespace") that can capture abstract frames and roles used in predicate-argument (and logical) semantics. The usefulness of these representations depends on the tasks they are trained to perform, and using multiple related tasks can lead to benefits on all of them, by sharing of statistical strength across task-specific representations, across elements of the semantic lexicon, and even across languages.
文本的自动理解是一种将推进广泛语言技术的能力,包括信息提取、问答、意见分析和语言之间的翻译。 这些技术在情报和国防界的需求已经持续了很多年,它们现在是许多商业信息管理工具的基础。 该项目开发了强大的算法,通过将自然语言表达映射到其含义的正式表示来理解自然语言表达,这种技术称为语义分析。 要在上面列出的技术中使用语义解析,它需要克服广泛覆盖的基本挑战,即处理多种语言的任何文本输入的能力。 该项目通过创建新的方法来满足这一挑战,这些方法用于以大大降低的成本收集语义注释数据的大型存储库;然后使用这些方法来训练更准确的广泛覆盖的解析模型。 该项目的成果包括开源实现、规模空前的高质量注释语料库,以及可供自然语言处理研究人员和实践者社区使用的可重用分布式语义表示。广泛覆盖的语义解析的目标只能通过同时关注新的,大规模的数据源与语义有意义的注释和新的学习算法,用于诱导模型的代表性能力,充分利用这些数据来实现。 对于可扩展的数据收集,该项目引入了依赖于两个关键互补见解的新技术:(1)任何理解文本的读者都可以回答有关文本的问题,(2)可以构建问题,其答案可以探索需要恢复的语义的任何方面。 这些观察允许设计新的数据收集技术,通过提供关于文本的简单问题和答案来减轻语义注释的负担。 这种QA风格的注释可以针对任何语言的任何文本完成,只针对母语人士,绕过了目前定义详细注释标准的重要工作。 它还允许在更大的规模上收集新的数据集,并且比以往任何时候都更多样化的文本类型。 此外,该项目还开发了新的表示学习技术,将各种语义注释风格(包括新的众包风格)结合在一起,用于多任务学习设置。 连续表示(例如,提供了一种强大的方法来允许在大型词汇表中共享统计强度,其中许多元素很少被观察到。 虽然过去的工作强调学习单词嵌入,但这个项目采用了一个共享的连续空间(“framespace”),可以捕获谓词-参数(和逻辑)语义中使用的抽象框架和角色。 这些表征的有用性取决于它们被训练来执行的任务,并且使用多个相关的任务可以通过在特定于任务的表征之间、在语义词典的元素之间、甚至在语言之间共享统计强度来使所有这些表征都受益。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Noah Smith其他文献
Buying health: assessing the impact of a consumer-side vegetable subsidy on purchasing, consumption and waste
购买健康:评估消费者侧蔬菜补贴对购买、消费和浪费的影响
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:3.2
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Implications for cumulative and prolonged clinical improvement induced by cross-linked hyaluronic acid: An in vivo biochemical/microscopic study in humans.
交联透明质酸诱导的累积和长期临床改善的影响:人类体内生化/显微镜研究。
- DOI:
10.1111/exd.14998 - 发表时间:
2024 - 期刊:
- 影响因子:3.6
- 作者:
Frank Wang;T. Do;Noah Smith;J. Orringer;Sewon Kang;John J Voorhees;Gary J. Fisher - 通讯作者:
Gary J. Fisher
THE NORTH ATLANTIC TREATY ORGANIZATION AND UNITED STATES RELATIONSHIP: A STUDY OF ITS DEVELOPMENT AND POSSIBLE FUTURE
北大西洋公约组织与美国的关系:对其发展和可能的未来的研究
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Noah Smith - 通讯作者:
Noah Smith
Constructions of locally recoverable codes with large availability
- DOI:
10.1007/s10623-025-01624-w - 发表时间:
2025-04-05 - 期刊:
- 影响因子:1.200
- 作者:
Giacomo Micheli;Vincenzo Pallozzi Lavorante;Abhi Shukul;Noah Smith - 通讯作者:
Noah Smith
Biopsy of Suspected Melanoma
疑似黑色素瘤活检
- DOI:
10.1007/978-3-319-46029-1_10-1 - 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Noah Smith;T. Johnson;J. Kelly;A. Sober;C. Bichakjian - 通讯作者:
C. Bichakjian
Noah Smith的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Noah Smith', 18)}}的其他基金
NSF-BSF: RI: Small: Efficient Transformers via Formal and Empirical Analysis
NSF-BSF:RI:小型:通过形式和经验分析的高效变压器
- 批准号:
2113530 - 财政年份:2021
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
RI/SES: Conference Proposal: Doctoral Consortium on Text as Data
RI/SES:会议提案:文本即数据博士联盟
- 批准号:
1830158 - 财政年份:2018
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
NSF-BSF: RI: Small: Collaborative Research: Modeling Crosslinguistic Influences Between Language Varieties
NSF-BSF:RI:小型:协作研究:模拟语言品种之间的跨语言影响
- 批准号:
1813153 - 财政年份:2018
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Workshop: Support for a workshop on scientific research applications of natural language technologies
研讨会:支持自然语言技术科研应用研讨会
- 批准号:
1433108 - 财政年份:2014
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
BIGDATA: Small: DA: Big Multilinguality for Data-Driven Lexical Semantics
BIGDATA:小:DA:数据驱动词汇语义的大多语言性
- 批准号:
1251131 - 财政年份:2013
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
EAGER: PARTIAL: An Exploratory Study on Practical Approaches for Robust NLP Tools with Integrated Annotation Languages
EAGER: PARTIAL:关于具有集成注释语言的鲁棒 NLP 工具实用方法的探索性研究
- 批准号:
1352440 - 财政年份:2013
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
SoCS: Collaborative Research: Data-Driven, Computational Models for Discovery and Analysis of Framing
SoCS:协作研究:用于发现和分析框架的数据驱动计算模型
- 批准号:
1211277 - 财政年份:2012
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
CAREER: Flexible Learning for Natural Language Processing
职业:自然语言处理的灵活学习
- 批准号:
1054319 - 财政年份:2011
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
RI-Small: Probabilistic Models for Structure Discovery in Text
RI-Small:文本结构发现的概率模型
- 批准号:
0915187 - 财政年份:2009
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
SGER: Scaling up unsupervised grammar induction
SGER:扩大无监督语法归纳
- 批准号:
0836431 - 财政年份:2008
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
相似海外基金
RII Track-4:@NASA: Bluer and Hotter: From Ultraviolet to X-ray Diagnostics of the Circumgalactic Medium
RII Track-4:@NASA:更蓝更热:从紫外到 X 射线对环绕银河系介质的诊断
- 批准号:
2327438 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: Topological Defects and Dynamic Motion of Symmetry-breaking Tadpole Particles in Liquid Crystal Medium
合作研究:液晶介质中对称破缺蝌蚪粒子的拓扑缺陷与动态运动
- 批准号:
2344489 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: AF: Medium: The Communication Cost of Distributed Computation
合作研究:AF:媒介:分布式计算的通信成本
- 批准号:
2402836 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Collaborative Research: AF: Medium: Foundations of Oblivious Reconfigurable Networks
合作研究:AF:媒介:遗忘可重构网络的基础
- 批准号:
2402851 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Continuing Grant
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Medium: Differentiable Hardware Synthesis
合作研究:SHF:媒介:可微分硬件合成
- 批准号:
2403134 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321102 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Transforming the Molecular Science Research Workforce through Integration of Programming in University Curricula
协作研究:网络培训:实施:中:通过将编程融入大学课程来改变分子科学研究人员队伍
- 批准号:
2321045 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: Implementation: Medium: Training Users, Developers, and Instructors at the Chemistry/Physics/Materials Science Interface
协作研究:网络培训:实施:媒介:在化学/物理/材料科学界面培训用户、开发人员和讲师
- 批准号:
2321103 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant
Collaborative Research: CPS: Medium: Automating Complex Therapeutic Loops with Conflicts in Medical Cyber-Physical Systems
合作研究:CPS:中:自动化医疗网络物理系统中存在冲突的复杂治疗循环
- 批准号:
2322534 - 财政年份:2024
- 资助金额:
$ 100.6万 - 项目类别:
Standard Grant