CAREER: Achieving Quality Information Extraction from Scientific Documents with Heterogeneous Weak Supervisions

职业:通过异构弱监督实现科学文档中的质量信息提取

基本信息

  • 批准号:
    2237831
  • 负责人:
  • 金额:
    $ 49.99万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-07-01 至 2028-06-30
  • 项目状态:
    未结题

项目摘要

The volume and breadth of the scientific literature is growing at an astonishing pace, making it challenging for researchers to keep up. Information extraction systems that can automatically extract structured information from this unstructured text are in high demand. Benefits from automated information extraction (IE) are multi-fold: it is easier to search and organize scientific documents, it results in efficiency gains for curators, and it reduces curation costs, among others. Although supervised deep learning-based IE methods achieve curation-level performance on some applications, large training datasets with accurate annotations are necessary to achieve these results. The goal of this project is to develop an adaptable and flexible information extraction framework that learns from existing resources and does not rely on costly and time-consuming expert annotations, and bridges the performance gap in real applications addressing extraction quality concerns and unique requirements of IE tasks in the scientific literature. Success in this project will benefit many domains by providing mechanisms for processing massive unlabeled textual datasets, speeding up literature understanding and the curation process, and promoting new scientific discoveries. The investigator will engage in departmental Broadening Participation in Computing (BPC) activities and create educational materials based on results from this project for outreach programs to local k-12 schools and communities.This project is focused on three complementary research thrusts, each of which addresses one key obstacle of information extraction on scientific documents: 1) advancing IE models to work with heterogeneous supervisions such as distant supervision and indirect supervision while taking advantage of all existing resources, 2) developing new semi-open information extraction tasks to extract detailed context and uncertainties at the document level, and 3) developing a novel learn-from-mistake paradigm that integrates first-order logic rules and new annotations from domain users to refine the IE models and results. The proposed research will address a variety of problems drawn from different information extraction settings, which will lead to new principles, methods, and technologies for machine learning, data mining, and natural language processing. The research thrusts will be applied to extract information from STEM textbooks to construct concept networks for education purposes.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学文献的数量和广度正在以惊人的速度增长,这使得研究人员难以跟上。信息提取系统,可以自动提取结构化的信息,从这种非结构化的文本是在高的需求。自动信息提取(IE)的好处是多方面的:更容易搜索和组织科学文献,为策展人带来效率提升,降低策展成本等。尽管基于监督式深度学习的IE方法在某些应用程序上实现了管理级性能,但要实现这些结果,需要具有准确注释的大型训练数据集。该项目的目标是开发一个适应性强,灵活的信息提取框架,从现有的资源,不依赖于昂贵和耗时的专家注释,并在真实的应用程序解决提取质量问题和科学文献中的IE任务的独特要求的性能差距的桥梁。该项目的成功将通过提供处理大量未标记文本数据集的机制,加速文献理解和策展过程,并促进新的科学发现,使许多领域受益。研究人员将参与部门扩大参与计算(BPC)活动,并根据该项目的结果创建教育材料,用于当地K-12学校和社区的推广计划。该项目侧重于三个互补的研究方向,每个方向都解决了科学文献信息提取的一个关键障碍:1)在利用所有现有资源的同时,推进IE模型以与异构监督(诸如远程监督和间接监督)一起工作,2)开发新的半开放信息提取任务以在文档级提取详细上下文和不确定性,以及3)开发一种新的从错误中学习的范例,该范例集成了一阶逻辑规则和来自领域用户的新注释,以改进IE模型和结果。拟议的研究将解决来自不同信息提取设置的各种问题,这将为机器学习,数据挖掘和自然语言处理带来新的原理,方法和技术。该奖项反映了NSF的法定使命,通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Qi Li其他文献

A stability study of carbonyl compounds in Tedlar bags by a fabricated MEMS microreactor approach
通过制造 MEMS 微反应器方法研究 Tedlar 袋中羰基化合物的稳定性
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Qi Li;Xiao;Kai;Haifeng He;Nan Jiang
  • 通讯作者:
    Nan Jiang
Experimental study on mechanical vibration massage for treatment of brachial plexus injury in rats.
机械振动按摩治疗大鼠臂丛神经损伤的实验研究
The low-frequency sound power measuring technique for an underwater source in a nonanechoic tank
非消声池水下声源低频声功率测量技术
  • DOI:
    10.1088/1361-6501/aa9f6e
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    2.4
  • 作者:
    Yi-Ming Zhang;Rui Tang;Qi Li;Da-Jing Shang
  • 通讯作者:
    Da-Jing Shang
RETRACTED ARTICLE: Methamphetamine causes acute toxicity in the retina of Balb/c mice
撤回文章:甲基苯丙胺对 Balb/c 小鼠视网膜造成急性毒性
A data‐driven adversarial examples recognition framework via adversarial feature genomes
通过对抗特征基因组的数据驱动的对抗样本识别框架

Qi Li的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Qi Li', 18)}}的其他基金

AccelNet-Design: A Global Network of Networks of Integrated Urban Services (GNNIUS) for Healthy and Smart Cities
AccelNet-Design:面向健康和智慧城市的全球综合城市服务网络 (GNNIUS)
  • 批准号:
    2301858
  • 财政年份:
    2023
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
CAREER: Multi-Scalar Transport and Similarity in the Urban Boundary Layer
职业:城市边界层的多标量交通和相似性
  • 批准号:
    2143664
  • 财政年份:
    2022
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Continuing Grant
III: Small: Collaborative Research: Algorithms, systems, and theories for exploiting data dependencies in crowdsourcing
III:小型:协作研究:在众包中利用数据依赖性的算法、系统和理论
  • 批准号:
    2007941
  • 财政年份:
    2020
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Collaborative Research: Geoengineering of Urban Green Infrastructure to Improve Outdoor Livability
合作研究:城市绿色基础设施地球工程,提高户外宜居性
  • 批准号:
    2028842
  • 财政年份:
    2020
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Collaborative Research: CAS-MNP--Precursors of Long-Distance Aerial Transport of Microplastics from Urban Environments
合作研究:CAS-MNP——城市环境中长距离空中运输微塑料的前体
  • 批准号:
    2028644
  • 财政年份:
    2020
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Design and Characterization of Two-Dimensional Electron Gas with Strong Spin-Orbit Coupling Based on Transition Metal Oxides
基于过渡金属氧化物的强自旋轨道耦合二维电子气的设计与表征
  • 批准号:
    1905833
  • 财政年份:
    2019
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Multiferroic Tunnel Junction with Active Dual Layer Barrier
具有主动双层势垒的多铁性隧道结
  • 批准号:
    1411166
  • 财政年份:
    2014
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Interfacial Electromagnetic Coupling in Multiferroic Tunnel Junctions
多铁性隧道结中的界面电磁耦合
  • 批准号:
    1207474
  • 财政年份:
    2012
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Continuing Grant
III: Small: An Automatic Framework for Processing Drosophila Embryonic Images
III:小型:处理果蝇胚胎图像的自动框架
  • 批准号:
    1016668
  • 财政年份:
    2010
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant
Study of Multiferroic Tunnel Junctions
多铁性隧道结的研究
  • 批准号:
    0907604
  • 财政年份:
    2009
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Standard Grant

相似海外基金

The UK's Clean Air Hospital Framework: The Role of Hospitals as Anchor Institutions in Achieving Improved Air Quality Within the Communities They Serv
英国清洁空气医院框架:医院作为关键机构在改善其服务社区的空气质量方面的作用
  • 批准号:
    2743070
  • 财政年份:
    2022
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Studentship
Effectiveness of patient decision aids and their elements for achieving quality health decisions: systematic review with network meta-analysis to inform and update the international standards
患者决策辅助工具的有效性及其实现高质量健康决策的要素:通过网络荟萃分析进行系统审查,以告知和更新国际标准
  • 批准号:
    451493
  • 财政年份:
    2021
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Operating Grants
Water quality, irrigation and on-farm controls for achieving global food safety and nutritional security
水质、灌溉和农场控制,以实现全球粮食安全和营养保障
  • 批准号:
    10170602
  • 财政年份:
    2020
  • 资助金额:
    $ 49.99万
  • 项目类别:
Water quality, irrigation and on-farm controls for achieving global food safety and nutritional security
水质、灌溉和农场控制,以实现全球粮食安全和营养保障
  • 批准号:
    10470001
  • 财政年份:
    2020
  • 资助金额:
    $ 49.99万
  • 项目类别:
Achieving Excellence in Biopsychosocial Cancer Pain Management through a Comprehensive Quality Education Program
通过全面的优质教育计划实现生物心理社会癌症疼痛管理的卓越
  • 批准号:
    10011775
  • 财政年份:
    2018
  • 资助金额:
    $ 49.99万
  • 项目类别:
Achieving Excellence in Biopsychosocial Cancer Pain Management through a Comprehensive Quality Education Program
通过全面的优质教育计划实现生物心理社会癌症疼痛管理的卓越
  • 批准号:
    10461096
  • 财政年份:
    2018
  • 资助金额:
    $ 49.99万
  • 项目类别:
Achieving Excellence in Biopsychosocial Cancer Pain Management through a Comprehensive Quality Education Program
通过全面的优质教育计划实现生物心理社会癌症疼痛管理的卓越
  • 批准号:
    10251052
  • 财政年份:
    2018
  • 资助金额:
    $ 49.99万
  • 项目类别:
Achieving quality control during veneer drying by using big data statistics
利用大数据统计实现单板干燥过程中的质量控制
  • 批准号:
    507369-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 49.99万
  • 项目类别:
    Engage Grants Program
Achieving Better Care, Quality and Value Through System Design that Enables Health Care Providers to Work to Their Optimal Scopes of Practice
通过系统设计实现更好的护理、质量和价值,使医疗保健提供者能够达到最佳的实践范围
  • 批准号:
    308616
  • 财政年份:
    2014
  • 资助金额:
    $ 49.99万
  • 项目类别:
Achieving Better Care, Quality and Value Through System Design that Enables Health Care Providers to Work to Their Optimal Scopes of Practice
通过系统设计实现更好的护理、质量和价值,使医疗保健提供者能够达到最佳的实践范围
  • 批准号:
    308599
  • 财政年份:
    2013
  • 资助金额:
    $ 49.99万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了