CAREER: Learning to Extract Consistent Event Graphs from Long and Complex Documents
职业:学习从长而复杂的文档中提取一致的事件图
基本信息
- 批准号:2340435
- 负责人:
- 金额:$ 56.12万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-05-01 至 2029-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Documents about real-world events are published daily. The large number of such documents makes it very hard for people to read and absorb them all, a phenomenon known as “information overload". Applying computer algorithms that can automatically extract events is a promising solution because they can transform large amounts of text into smaller summaries in the form of structured event knowledge graphs that reveal the relationships between the people, places, and times in the events. Current deep learning-based event extraction techniques mainly focus on extracting event knowledge at the level of individual sentences and are unable to extract a knowledge graph spanning multiple sentences with sufficient accuracy or efficiency. For example, existing techniques would struggle with events described in a long document having multiple sections. Moreover, these extraction techniques do not capture accurate information regarding real-life events because they typically include nuanced attributes such as causes and effects. The research goal of this CAREER award is to build information extraction (IE) methods with natural language processing methods, using the latest deep learning-based techniques, to construct an event knowledge graph for storing knowledge and improving the ability of people to track rapidly evolving event information. In the short term, the project will improve the quality and comprehensiveness of event knowledge graphs. In the long run, the project will entirely transform people's experiences and habits in acquiring event knowledge from various sources. The system to be developed through this award will better support numerous event-oriented tasks that people need to perform, such as future event prediction, event factuality verification, and risk event prevention, all of which have profound impacts on society. Moreover, our work would make fundamental contributions to a wide range of interdisciplinary applications such as statutory reasoning based on legal documents, prediction of disease outbreaks, and biomedical document understanding, all of which currently rely on extremely slow and high-cost methods.The general technical goal of this project is to address the knowledge gap of event extraction from long and complex documents (as compared to the traditional sentence-level extraction) and to do so in an efficient manner. The general goal is divided into three sub-research goals. First, to extract the entirety of event attributes, which is not possible for current models trained on a dataset with a predefined schema, the project introduces a new question-answer generation paradigm that enables a novel representation of events from clusters of documents discussing the same events. The project will leverage document hierarchy information for extracting events, which enforces the validity and broad coverage of event information. Motivated by the fact that current event knowledge construction is inefficient and is impaired by pairwise event-event relation predictions, the second research goal is to develop novel techniques enabling the construction of the event knowledge graph. For this purpose, the investigators propose interleaving targeted retrieval and joint modeling of event arguments and entity-entity relations. This not only enables efficient updating of graphs, but also ensures its global consistency. Finally, the third goal is to adapt to individual information-seeking needs, which is not considered by current methods. The project will study schema induction strategies and schema matching algorithms for adapting the event knowledge graph to user preferences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
关于真实世界事件的文件每天都在发布。这些文件数量庞大,使人们很难全部阅读和吸收,这种现象被称为“信息过载”。应用可以自动提取事件的计算机算法是一个很有前途的解决方案,因为它们可以将大量文本转换为结构化事件知识图形式的较小摘要,揭示事件中人,地点和时间之间的关系。目前基于深度学习的事件提取技术主要集中在单个句子级别的事件知识提取,无法以足够的准确度或效率提取跨越多个句子的知识图。例如,现有技术将难以处理在具有多个部分的长文档中描述的事件。此外,这些提取技术不能捕获关于现实生活事件的准确信息,因为它们通常包括细微差别的属性,例如原因和影响。本次CAREER奖的研究目标是用自然语言处理方法构建信息提取(IE)方法,利用最新的基于深度学习的技术,构建事件知识图,用于存储知识,提高人们跟踪快速演变的事件信息的能力。在短期内,该项目将提高事件知识图谱的质量和全面性。从长远来看,该项目将完全改变人们从各种来源获取事件知识的经验和习惯。通过该奖项开发的系统将更好地支持人们需要执行的众多面向事件的任务,例如未来事件预测,事件真实性验证和风险事件预防,所有这些都对社会产生深远影响。此外,我们的工作将为广泛的跨学科应用做出重要贡献,例如基于法律的文件的法定推理,疾病爆发的预测和生物医学文件理解,所有这些目前都依赖于非常缓慢和高成本的方法。该项目的总体技术目标是解决从长而复杂的文档中提取事件的知识缺口(与传统的重复级别提取相比)并且以有效的方式这样做。总目标分为三个子研究目标。首先,为了提取事件属性的全部,这对于在具有预定义模式的数据集上训练的当前模型是不可能的,该项目引入了一种新的问答生成范式,该范式能够从讨论相同事件的文档集群中实现事件的新表示。该项目将利用文档层次结构信息来提取事件,从而加强事件信息的有效性和广泛覆盖。由于当前事件知识构建效率低下,并且受成对事件-事件关系预测的影响,第二个研究目标是开发新的技术,使事件知识图的构建成为可能。为此,研究人员提出交错有针对性的检索和联合建模的事件参数和实体实体关系。这不仅可以有效地更新图,而且还可以确保其全局一致性。最后,第三个目标是适应个人的信息寻求需求,这是目前的方法没有考虑。该项目将研究模式归纳策略和模式匹配算法,以使事件知识图适应用户偏好。该奖项反映了NSF的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xinya Du其他文献
Duality of immune recognition by tomato and virulence activity of Ralstonia solanacearum exo-polygalacturonase PehC
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:
- 作者:
Jingjing Ke;Wanting Zhu;Ying Yuan;Xinya Du;Ai Xu;Dan Zhang;Sen Cao;Wei Chen;Yang Lin;Jiatao Xie;Jiasen Cheng;Yanping Fu;Daohong Jiang;Xiao Yu;Bo Li - 通讯作者:
Bo Li
Measuring industrial operational efficiency and factor analysis: A dynamic series-parallel recycling DEA model.
衡量工业运营效率和因素分析:动态串并联回收 DEA 模型。
- DOI:
10.1016/j.scitotenv.2022.158084 - 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Lina Zhang;Xinya Du;Yung‐ho Chiu;Q. Pang;XiaoWang;Qianwen Yu - 通讯作者:
Qianwen Yu
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
VIA:用于全局和本地视频编辑的时空视频适应框架
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jing Gu;Yuwei Fang;Ivan Skorokhodov;Peter Wonka;Xinya Du;Sergey Tulyakov;Xin Eric Wang - 通讯作者:
Xin Eric Wang
Xinya Du的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Understanding structural evolution of galaxies with machine learning
- 批准号:n/a
- 批准年份:2022
- 资助金额:10.0 万元
- 项目类别:省市级项目
煤矿安全人机混合群智感知任务的约束动态多目标Q-learning进化分配
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于领弹失效考量的智能弹药编队短时在线Q-learning协同控制机理
- 批准号:62003314
- 批准年份:2020
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
集成上下文张量分解的e-learning资源推荐方法研究
- 批准号:61902016
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
具有时序处理能力的Spiking-Deep Learning(脉冲深度学习)方法研究
- 批准号:61573081
- 批准年份:2015
- 资助金额:64.0 万元
- 项目类别:面上项目
基于有向超图的大型个性化e-learning学习过程模型的自动生成与优化
- 批准号:61572533
- 批准年份:2015
- 资助金额:66.0 万元
- 项目类别:面上项目
E-Learning中学习者情感补偿方法的研究
- 批准号:61402392
- 批准年份:2014
- 资助金额:26.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Machine learning models to extract accurate material transport
机器学习模型可提取准确的材料传输
- 批准号:
2887164 - 财政年份:2023
- 资助金额:
$ 56.12万 - 项目类别:
Studentship
Applying Machine Learning to Extract Parameters from Radiographs and Ultrasound Scans to Predict the Progression of Scoliosis
应用机器学习从射线照片和超声扫描中提取参数来预测脊柱侧凸的进展
- 批准号:
548082-2020 - 财政年份:2022
- 资助金额:
$ 56.12万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Applying Machine Learning to Extract Parameters from Radiographs and Ultrasound Scans to Predict the Progression of Scoliosis
应用机器学习从射线照片和超声扫描中提取参数来预测脊柱侧凸的进展
- 批准号:
548082-2020 - 财政年份:2021
- 资助金额:
$ 56.12万 - 项目类别:
Postgraduate Scholarships - Doctoral
Applying Machine Learning to Extract Parameters from Radiographs and Ultrasound Scans to Predict the Progression of Scoliosis
应用机器学习从射线照片和超声扫描中提取参数来预测脊柱侧凸的进展
- 批准号:
548082-2020 - 财政年份:2020
- 资助金额:
$ 56.12万 - 项目类别:
Postgraduate Scholarships - Doctoral
Deep Learning Approaches to Extract Information from Web data
从网络数据中提取信息的深度学习方法
- 批准号:
539690-2019 - 财政年份:2019
- 资助金额:
$ 56.12万 - 项目类别:
University Undergraduate Student Research Awards
Deep Learning Approaches to Extract Information from Web data
从网络数据中提取信息的深度学习方法
- 批准号:
525323-2018 - 财政年份:2018
- 资助金额:
$ 56.12万 - 项目类别:
University Undergraduate Student Research Awards
Molecular images and machine learning to extract placental function from maternal cfDNA
分子图像和机器学习从母体 cfDNA 中提取胎盘功能
- 批准号:
10359690 - 财政年份:2018
- 资助金额:
$ 56.12万 - 项目类别:
Applying machine learning to extract galaxy properties
应用机器学习提取星系属性
- 批准号:
2075907 - 财政年份:2018
- 资助金额:
$ 56.12万 - 项目类别:
Studentship
Deep Learning Approaches to Extract Information from Web data
从网络数据中提取信息的深度学习方法
- 批准号:
511876-2017 - 财政年份:2017
- 资助金额:
$ 56.12万 - 项目类别:
University Undergraduate Student Research Awards
Deep Learning Approaches to Extract Information from Web data
从网络数据中提取信息的深度学习方法
- 批准号:
496024-2016 - 财政年份:2016
- 资助金额:
$ 56.12万 - 项目类别:
University Undergraduate Student Research Awards