CAREER: Knowledge Extraction and Discovery from Massive Text Corpora via Extremely Weak Supervision
职业:通过极弱监督从海量文本语料库中提取和发现知识
基本信息
- 批准号:2239440
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2023
- 资助国家:美国
- 起止时间:2023-07-01 至 2028-06-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Automated knowledge extraction and discovery methods can address the diverse needs of different users (e.g., governments for decision making and scientists for literature summary). A fundamental open problem is how much user effort automated methods require to obtain useful knowledge. This project aims to minimize such required user effort with a newly proposed paradigm, extremely weak supervision – It includes only brief natural-language user input to define the task (e.g., a list of topics when classifying news articles; location names when classifying events), guidance similar to task-specific guidelines that might be provided to human annotators. By using brief natural-language input instead of labor-intensive annotated training samples, this new paradigm will help democratize knowledge extraction and discovery, and extend its application beyond rich companies to ordinary, relatively untrained users with a broad range of needs (e.g., domain scientists and small business owners). Project outcomes will be disseminated via top conferences and scholarly publications and integrated into new courses. This project will also support a diverse set of graduate, undergraduate, and high school students.This project focuses on four fundamental, interconnected knowledge extraction and discovery tasks, i.e., text classification, phrase mining, named entity recognition, and relation extraction. Following the extremely weak supervision paradigm, this project will develop a series of novel methods, including (1) an unsupervised phrase tagging method for both multi-gram and unigram (emerging) phrases, (2) a text classification method that can take only the most popular (e.g., top-50%) class names as input to discover novel classes (i.e., new classes are not explicitly defined by the user) and build a classifier for all the classes; (3) a named entity recognition method that can take a few popular entity types and mentions of interest to recognize (emerging) entity mentions of the same/similar types; and (4) a relation extraction method that can take a few popular relation types and tuples of interest to discover relations of similar semantics and extract relevant tuples. All these methods, by design, will be agnostic to domains and languages and require only the availability of pre-trained neural language models in a particular domain and language.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
自动化知识提取和发现方法可以解决不同用户的不同需求(例如,政府负责决策,科学家负责文献摘要)。一个基本的开放性问题是,用户需要花费多少努力来获取有用的知识。该项目旨在通过新提出的范式,极弱的监督来最大限度地减少这种所需的用户努力-它只包括简短的自然语言用户输入来定义任务(例如,分类新闻文章时的主题列表;分类事件时的位置名称)、类似于可提供给人类注释者的任务特定指南的指南。通过使用简短的自然语言输入,而不是劳动密集型的注释训练样本,这种新的范式将有助于知识提取和发现的民主化,并将其应用扩展到富裕公司之外的普通,相对未经训练的用户,这些用户具有广泛的需求(例如,领域科学家和小企业主)。项目成果将通过顶级会议和学术出版物传播,并纳入新课程。该项目还将支持不同的研究生、本科生和高中生。该项目重点关注四个基本的、相互关联的知识提取和发现任务,即,文本分类、短语挖掘、命名实体识别和关系提取。遵循极弱的监督范式,该项目将开发一系列新颖的方法,包括(1)用于多元和单元(新兴)短语的无监督短语标记方法,(2)只能采用最流行的文本分类方法(例如,前50%)类名作为输入以发现新类(即,新的类不是由用户明确定义的),并为所有的类建立一个分类器;(3)命名实体识别方法,它可以采用一些流行的实体类型和感兴趣的提及来识别(新兴)实体提及相同/相似类型;以及(4)一种关系抽取方法,它可以利用一些流行的关系类型和感兴趣的元组来发现语义相似的关系,相关的元组。所有这些方法,通过设计,将是不可知的领域和语言,只需要在特定的领域和语言的预训练神经语言模型的可用性。这个奖项反映了NSF的法定使命,并已被认为是值得的支持,通过评估使用基金会的智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jingbo Shang其他文献
Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path
迈向 Web 挖掘中的零样本关系提取:具有相对 XML 路径的多模式方法
- DOI:
10.48550/arxiv.2305.13805 - 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Zilong Wang;Jingbo Shang - 通讯作者:
Jingbo Shang
Less than One-shot: Named Entity Recognition via Extremely Weak Supervision
不到一次:通过极弱监督进行命名实体识别
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Letian Peng;Zihan Wang;Jingbo Shang - 通讯作者:
Jingbo Shang
Involvement of poly(ADP-ribose) polymerase-1 in development of spinal cord injury in Chinese individuals: a Chinese clinical study
聚(ADP-核糖)聚合酶-1 参与中国人脊髓损伤的发生:一项中国临床研究
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Qingyang Meng;Guang;Renbo Li;Jing;Wei Zhou;Hong;Bo Chen;Li Jiang;Jingbo Shang - 通讯作者:
Jingbo Shang
AI-native Memory: A Pathway from LLMs Towards AGI
AI 原生内存:从法学硕士迈向 AGI 的途径
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Jingbo Shang;Zai Zheng;Xiang Ying;Felix Tao;Mindverse Team - 通讯作者:
Mindverse Team
CubeNet: Multi-Facet Hierarchical Heterogeneous Network Construction, Analysis, and Mining
CubeNet:多方面分层异构网络构建、分析和挖掘
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Carl Yang;Dai Teng;Siyang Liu;Sayantan Basu;Jieyu Zhang;Jiaming Shen;Chao Zhang;Jingbo Shang;Lance M. Kaplan;Timothy Harratty;Jiawei Han - 通讯作者:
Jiawei Han
Jingbo Shang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jingbo Shang', 18)}}的其他基金
NSF Convergence Accelerator Track D: Towards Intelligent Sharing and Search for AI Models and Datasets
NSF 融合加速器轨道 D:迈向人工智能模型和数据集的智能共享和搜索
- 批准号:
2040727 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似海外基金
CAREER: Towards Open World Event Knowledge Extraction with Weak Supervision
职业:在弱监督下实现开放世界事件知识提取
- 批准号:
2238940 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Research on clinical knowledge extraction infrastructure using real-world data derived from electronic medical records
使用电子病历中的真实数据提取临床知识基础设施的研究
- 批准号:
23K17001 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Collaborative Research: CISE-MSI: DP: IIS: Event Detection and Knowledge Extraction via Learning and Causality Analysis for Resilience Emergency Response
协作研究:CISE-MSI:DP:IIS:通过学习和因果关系分析进行事件检测和知识提取,以实现弹性应急响应
- 批准号:
2219615 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: CISE-MSI: DP: IIS: Event Detection and Knowledge Extraction via Learning and Causality Analysis for Resilience Emergency Response
协作研究:CISE-MSI:DP:IIS:通过学习和因果关系分析进行事件检测和知识提取,以实现弹性应急响应
- 批准号:
2219614 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
A new approach for traffic data management and modeling that combines storage efficiency and immediate knowledge extraction
一种结合存储效率和即时知识提取的交通数据管理和建模新方法
- 批准号:
23K17800 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Information extraction accumulating graph-formed knowledge with deep learning
通过深度学习积累图形化知识的信息提取
- 批准号:
22KJ2983 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Hippocampal cortical interactions and the extraction of knowledge from episodic memory
海马皮层相互作用和从情景记忆中提取知识
- 批准号:
RGPIN-2017-03857 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Discovery Grants Program - Individual
Accelerating medicine development timelines through new approaches in knowledge extraction from diverse biological data sets
通过从不同生物数据集中提取知识的新方法加快药物开发进程
- 批准号:
MR/W003996/1 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Research Grant
A real-time system for data streaming and knowledge extraction on mobile devices
移动设备上的数据流和知识提取的实时系统
- 批准号:
DDG-2019-05756 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Discovery Development Grant
Revamping Real Estate investments with a Neural Network pipeline for image recognition and knowledge extraction from floor plans and planning applications
使用神经网络管道改造房地产投资,以进行图像识别并从平面图和规划应用程序中提取知识
- 批准号:
10004707 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Collaborative R&D