CAREER: Towards Open World Event Knowledge Extraction with Weak Supervision
职业:在弱监督下实现开放世界事件知识提取
基本信息
- 批准号:2238940
- 负责人:
- 金额:$ 59.35万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-08-15 至 2028-07-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Understanding events, such as who did what to whom, when and where, is one of the fundamental human activities to learn about the changing world. The answers to these questions underpin the key information conveyed in the overwhelming majority, if not all, of language-based communication. However, current research paradigm suffers from several shortcomings in extracting event knowledge from the open world scenarios. In these scenarios, knowledge extraction from data is limited to a few large domains (e.g., news or biomedical) or common languages (e.g., English, Spanish and Chinese), because of the heavy reliance on the human effort to contextualize data. This includes creating large-scale manual annotations or defining the schematic templates for a few target event types. This project aims to lay the foundation and establish new paradigms for open world event knowledge extraction by developing new and more efficient algorithms to extend the extraction capability to the wide range of scenario, while requiring minimal human effort. This foundation should provide extensive coverage of different event types and be easily adapted to emerging scenarios. The success of this project will directly benefit users of the intelligent information access systems. For applications that analyze emerging and trending topics and events, such as natural disasters, national elections, protest and disease outbreak, success of the proposed research will not only provide an accurate and abstractive summary and easy access of each topic for humans, but also allow analysts to better discover the participants of the events, the cause, effects and temporal orders among them, and help discover more insights. The technical aims of the project are divided into three thrusts. Thrust 1 develops schema-guided event extraction approaches. This is done by leveraging the knowledge from the complex target event schema, such as the event type structures (i.e., type name and argument roles), hierarchy and temporal/causal/part-whole relations among the event types, which provide valuable guidance, especially when there is few to no annotations available. While event annotations for most of the domains and scenarios are not existing and extremely expensive and time-consuming to obtain, the large-scale unlabeled in-domain data are usually accessible. Thus, Thrust 2 will further develops a suite of more efficient and novel self-training strategies to make use of the large-scale unlabeled data through self-supervision. In practice, there is even no event type schema available to most of the domains and scenarios, such as natural disaster or disease outbreak. Manually defining an event schema with high coverage is extremely challenging and time consuming as it requires background knowledge in both linguistics and the target domain, and humans need to manually examine a large amount of in-domain data to determine the salient event types. Considering these challenges, Thrust 3 further explores novel solutions to automatically deduce the target event schema, including event types, the roles of their participants, as well as their relations from the raw text and extract their event mentions accordingly.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
了解事件,例如谁对谁做了什么,何时何地,是人类了解不断变化的世界的基本活动之一。这些问题的答案支撑了绝大多数(如果不是全部)基于语言的交流中传达的关键信息。然而,目前的研究范式在从开放世界场景中提取事件知识方面存在一些不足。在这些场景中,从数据中提取知识仅限于几个大的领域(例如,新闻或生物医学)或共同语言(例如,英文、西班牙文和中文),因为在很大程度上依赖于人工将数据纳入背景。这包括创建大规模手动注释或为一些目标事件类型定义原理图模板。该项目旨在为开放世界事件知识提取奠定基础并建立新的范式,通过开发新的和更有效的算法,将提取能力扩展到广泛的场景,同时需要最少的人力。该基础应广泛覆盖不同的事件类型,并易于适应新出现的场景。该项目的成功将使智能信息接入系统的用户直接受益。对于分析新兴和趋势主题和事件的应用程序,如自然灾害,全国选举,抗议和疾病爆发,所提出的研究的成功不仅将为人类提供准确和抽象的摘要,并使每个主题易于访问,而且还允许分析师更好地发现事件的参与者,原因,影响和时间顺序,并帮助发现更多的见解。该项目的技术目标分为三个方面。Thrust 1开发了模式引导的事件提取方法。这是通过利用来自复杂目标事件模式的知识来完成的,诸如事件类型结构(即,类型名称和参数角色)、事件类型之间的层次结构和时间/因果/部分-整体关系,这提供了有价值的指导,特别是在很少或没有可用的注释时。虽然大多数领域和场景的事件注释都不存在,并且获取起来非常昂贵和耗时,但大规模未标记的域内数据通常是可访问的。因此,Thrust 2将进一步开发一套更有效和新颖的自我训练策略,通过自我监督来利用大规模的未标记数据。在实践中,甚至没有事件类型模式可用于大多数领域和场景,如自然灾害或疾病爆发。手动定义具有高覆盖率的事件模式是非常具有挑战性和耗时的,因为它需要语言学和目标领域的背景知识,并且人类需要手动检查大量的域内数据以确定突出的事件类型。考虑到这些挑战,Thrust 3进一步探索了新的解决方案,以自动推断目标事件模式,包括事件类型,参与者的角色,以及它们之间的关系,并从原始文本中提取相应的事件提及。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Lifu Huang其他文献
RPI BLENDER TAC-KBP2017 13 Languages EDL System
RPI BLENDER TAC-KBP2017 13 种语言 EDL 系统
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0.5
- 作者:
Boliang Zhang;Xiaoman Pan;Ying Lin;Tongtao Zhang;Kevin Blissett;Samia Kazemi;Spencer Whitehead;Lifu Huang;Heng Ji - 通讯作者:
Heng Ji
Towards Automatic Curation of Antibiotic Resistance Genes via Statement Extraction from Scientific Papers: A Benchmark Dataset and Models
通过从科学论文中提取语句来自动管理抗生素抗性基因:基准数据集和模型
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Sidhant Chandak;Liqing Zhang;Connor L. Brown;Lifu Huang - 通讯作者:
Lifu Huang
APrompt: Attention Prompt Tuning for Efficient Adaptation of Pre-trained Language Models
APrompt:注意力提示调优,有效适应预训练语言模型
- DOI:
10.18653/v1/2023.emnlp-main.567 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Qifan Wang;Yuning Mao;Jingang Wang;Hanchao Yu;Shaoliang Nie;Sinong Wang;Fuli Feng;Lifu Huang;Xiaojun Quan;Zenglin Xu;Dongfang Liu - 通讯作者:
Dongfang Liu
Generating A Crowdsourced Conversation Dataset to Combat Cybergrooming
生成众包对话数据集以打击网络诱骗
- DOI:
10.48550/arxiv.2405.13154 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Xinyi Zhang;Pamela J. Wisniewski;Jin;Lifu Huang;Sang Won Lee - 通讯作者:
Sang Won Lee
ELISA System Description for LoReHLT 2017
LoReHLT 2017 的 ELISA 系统说明
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Leon Cheung;Thamme Gowda;U. Hermjakob;N. Liu;Jonathan May;Alexandra Mayn;Nima Pourdamghani;Michael Pust;Kevin Knight;Nikolaos Malandrakis;Pavlos Papadopoulos;Anil Ramakrishna;Karan Singla;Victor R. Martinez;Colin Vaz;Dogan Can;Shrikanth S. Narayanan;Kenton Murray;Toan Q. Nguyen;David Chiang;Xiaoman Pan;Boliang Zhang;Ying Lin;Di Lu;Lifu Huang;Kevin Blissett;Tongtao Zhang;O. Glembek;M. Baskar;Santosh Kesiraju;L. Burget;Karel Beneš;I. Szoke;Karel Veselý;Camille Goudeseune;Mark H. Johnson;Leda Sari;Wenda Chen;Angli Liu - 通讯作者:
Angli Liu
Lifu Huang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
NSF Workshop: Towards an Open Source Model for Data and Metadata Standards
NSF 研讨会:迈向数据和元数据标准的开源模型
- 批准号:
2334483 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
Standard Grant
Conference: Pushing Towards Open-Source AI
会议:推动开源人工智能
- 批准号:
2335774 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 1: Transforming Volcanology towards Open Science in the Cloud with VICTOR
合作研究:GEO OSE Track 1:与 VICTOR 一起将火山学转变为云中的开放科学
- 批准号:
2324749 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
Standard Grant
Collaborative Research: GEO OSE Track 1: Transforming Volcanology towards Open Science in the Cloud with VICTOR
合作研究:GEO OSE Track 1:与 VICTOR 一起将火山学转变为云中的开放科学
- 批准号:
2324748 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
Standard Grant
An Open Innovation Ecosystem for exploitation of materials for building envelopes towards zero energy buildings (Exploit4InnoMat)
用于开发建筑围护结构材料以实现零能耗建筑的开放式创新生态系统 (Exploit4InnoMat)
- 批准号:
10048971 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
EU-Funded
AI-powered eVolution towards opEn and secuRe edGe architEctures
人工智能驱动的向开放和安全边缘架构的演进
- 批准号:
10071211 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
EU-Funded
VERGE: AI-powered eVolution towards opEn and secuRe edGe architEctures
边缘:人工智能驱动的向开放和安全边缘架构的演进
- 批准号:
10061781 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
EU-Funded
Collaborative Research: GEO OSE Track 1: Transforming Volcanology towards Open Science in the Cloud with VICTOR
合作研究:GEO OSE Track 1:与 VICTOR 一起将火山学转变为云中的开放科学
- 批准号:
2324747 - 财政年份:2023
- 资助金额:
$ 59.35万 - 项目类别:
Standard Grant
Towards Open-world Semi-supervised learning
走向开放世界的半监督学习
- 批准号:
2766068 - 财政年份:2022
- 资助金额:
$ 59.35万 - 项目类别:
Studentship
The theory and practice of 'trans-imperial history': towards an open-ended framework of research
“跨帝国史”的理论与实践:迈向开放式研究框架
- 批准号:
22H00690 - 财政年份:2022
- 资助金额:
$ 59.35万 - 项目类别:
Grant-in-Aid for Scientific Research (B)