权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Learning to Extract Consistent Event Graphs from Long and Complex Documents

职业：学习从长而复杂的文档中提取一致的事件图

基本信息

批准号：
2340435
负责人：
Xinya Du
金额：
$ 56.12万
依托单位：
University of Texas at Dallas
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-05-01 至 2029-04-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2340435&HistoricalAwards=false
关键词：
CAREER Learning Extract Consistent Event

项目摘要

Documents about real-world events are published daily. The large number of such documents makes it very hard for people to read and absorb them all, a phenomenon known as “information overload". Applying computer algorithms that can automatically extract events is a promising solution because they can transform large amounts of text into smaller summaries in the form of structured event knowledge graphs that reveal the relationships between the people, places, and times in the events. Current deep learning-based event extraction techniques mainly focus on extracting event knowledge at the level of individual sentences and are unable to extract a knowledge graph spanning multiple sentences with sufficient accuracy or efficiency. For example, existing techniques would struggle with events described in a long document having multiple sections. Moreover, these extraction techniques do not capture accurate information regarding real-life events because they typically include nuanced attributes such as causes and effects. The research goal of this CAREER award is to build information extraction (IE) methods with natural language processing methods, using the latest deep learning-based techniques, to construct an event knowledge graph for storing knowledge and improving the ability of people to track rapidly evolving event information. In the short term, the project will improve the quality and comprehensiveness of event knowledge graphs. In the long run, the project will entirely transform people's experiences and habits in acquiring event knowledge from various sources. The system to be developed through this award will better support numerous event-oriented tasks that people need to perform, such as future event prediction, event factuality verification, and risk event prevention, all of which have profound impacts on society. Moreover, our work would make fundamental contributions to a wide range of interdisciplinary applications such as statutory reasoning based on legal documents, prediction of disease outbreaks, and biomedical document understanding, all of which currently rely on extremely slow and high-cost methods.The general technical goal of this project is to address the knowledge gap of event extraction from long and complex documents (as compared to the traditional sentence-level extraction) and to do so in an efficient manner. The general goal is divided into three sub-research goals. First, to extract the entirety of event attributes, which is not possible for current models trained on a dataset with a predefined schema, the project introduces a new question-answer generation paradigm that enables a novel representation of events from clusters of documents discussing the same events. The project will leverage document hierarchy information for extracting events, which enforces the validity and broad coverage of event information. Motivated by the fact that current event knowledge construction is inefficient and is impaired by pairwise event-event relation predictions, the second research goal is to develop novel techniques enabling the construction of the event knowledge graph. For this purpose, the investigators propose interleaving targeted retrieval and joint modeling of event arguments and entity-entity relations. This not only enables efficient updating of graphs, but also ensures its global consistency. Finally, the third goal is to adapt to individual information-seeking needs, which is not considered by current methods. The project will study schema induction strategies and schema matching algorithms for adapting the event knowledge graph to user preferences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

关于真实世界事件的文件每天都在发布。这些文件数量庞大，使人们很难全部阅读和吸收，这种现象被称为“信息过载”。应用可以自动提取事件的计算机算法是一个很有前途的解决方案，因为它们可以将大量文本转换为结构化事件知识图形式的较小摘要，揭示事件中人，地点和时间之间的关系。目前基于深度学习的事件提取技术主要集中在单个句子级别的事件知识提取，无法以足够的准确度或效率提取跨越多个句子的知识图。例如，现有技术将难以处理在具有多个部分的长文档中描述的事件。此外，这些提取技术不能捕获关于现实生活事件的准确信息，因为它们通常包括细微差别的属性，例如原因和影响。本次CAREER奖的研究目标是用自然语言处理方法构建信息提取（IE）方法，利用最新的基于深度学习的技术，构建事件知识图，用于存储知识，提高人们跟踪快速演变的事件信息的能力。在短期内，该项目将提高事件知识图谱的质量和全面性。从长远来看，该项目将完全改变人们从各种来源获取事件知识的经验和习惯。通过该奖项开发的系统将更好地支持人们需要执行的众多面向事件的任务，例如未来事件预测，事件真实性验证和风险事件预防，所有这些都对社会产生深远影响。此外，我们的工作将为广泛的跨学科应用做出重要贡献，例如基于法律的文件的法定推理，疾病爆发的预测和生物医学文件理解，所有这些目前都依赖于非常缓慢和高成本的方法。该项目的总体技术目标是解决从长而复杂的文档中提取事件的知识缺口（与传统的重复级别提取相比）并且以有效的方式这样做。总目标分为三个子研究目标。首先，为了提取事件属性的全部，这对于在具有预定义模式的数据集上训练的当前模型是不可能的，该项目引入了一种新的问答生成范式，该范式能够从讨论相同事件的文档集群中实现事件的新表示。该项目将利用文档层次结构信息来提取事件，从而加强事件信息的有效性和广泛覆盖。由于当前事件知识构建效率低下，并且受成对事件-事件关系预测的影响，第二个研究目标是开发新的技术，使事件知识图的构建成为可能。为此，研究人员提出交错有针对性的检索和联合建模的事件参数和实体实体关系。这不仅可以有效地更新图，而且还可以确保其全局一致性。最后，第三个目标是适应个人的信息寻求需求，这是目前的方法没有考虑。该项目将研究模式归纳策略和模式匹配算法，以使事件知识图适应用户偏好。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Xinya Du其他文献

Duality of immune recognition by tomato and virulence activity of Ralstonia solanacearum exo-polygalacturonase PehC

DOI：
发表时间：
2023
期刊：
The Plant Cell
影响因子：
作者：
Jingjing Ke;Wanting Zhu;Ying Yuan;Xinya Du;Ai Xu;Dan Zhang;Sen Cao;Wei Chen;Yang Lin;Jiatao Xie;Jiasen Cheng;Yanping Fu;Daohong Jiang;Xiao Yu;Bo Li
通讯作者：
Bo Li