权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Knowledge Extraction and Discovery from Massive Text Corpora via Extremely Weak Supervision

职业：通过极弱监督从海量文本语料库中提取和发现知识

基本信息

批准号：
2239440
负责人：
Jingbo Shang
金额：
$ 60万
依托单位：
University of California-San Diego
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-07-01 至 2028-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2239440&HistoricalAwards=false
关键词：
CAREER Knowledge Extraction Discovery Massive

项目摘要

Automated knowledge extraction and discovery methods can address the diverse needs of different users (e.g., governments for decision making and scientists for literature summary). A fundamental open problem is how much user effort automated methods require to obtain useful knowledge. This project aims to minimize such required user effort with a newly proposed paradigm, extremely weak supervision – It includes only brief natural-language user input to define the task (e.g., a list of topics when classifying news articles; location names when classifying events), guidance similar to task-specific guidelines that might be provided to human annotators. By using brief natural-language input instead of labor-intensive annotated training samples, this new paradigm will help democratize knowledge extraction and discovery, and extend its application beyond rich companies to ordinary, relatively untrained users with a broad range of needs (e.g., domain scientists and small business owners). Project outcomes will be disseminated via top conferences and scholarly publications and integrated into new courses. This project will also support a diverse set of graduate, undergraduate, and high school students.This project focuses on four fundamental, interconnected knowledge extraction and discovery tasks, i.e., text classification, phrase mining, named entity recognition, and relation extraction. Following the extremely weak supervision paradigm, this project will develop a series of novel methods, including (1) an unsupervised phrase tagging method for both multi-gram and unigram (emerging) phrases, (2) a text classification method that can take only the most popular (e.g., top-50%) class names as input to discover novel classes (i.e., new classes are not explicitly defined by the user) and build a classifier for all the classes; (3) a named entity recognition method that can take a few popular entity types and mentions of interest to recognize (emerging) entity mentions of the same/similar types; and (4) a relation extraction method that can take a few popular relation types and tuples of interest to discover relations of similar semantics and extract relevant tuples. All these methods, by design, will be agnostic to domains and languages and require only the availability of pre-trained neural language models in a particular domain and language.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

自动化知识提取和发现方法可以解决不同用户的不同需求（例如，政府负责决策，科学家负责文献摘要）。一个基本的开放性问题是，用户需要花费多少努力来获取有用的知识。该项目旨在通过新提出的范式，极弱的监督来最大限度地减少这种所需的用户努力-它只包括简短的自然语言用户输入来定义任务（例如，分类新闻文章时的主题列表;分类事件时的位置名称）、类似于可提供给人类注释者的任务特定指南的指南。通过使用简短的自然语言输入，而不是劳动密集型的注释训练样本，这种新的范式将有助于知识提取和发现的民主化，并将其应用扩展到富裕公司之外的普通，相对未经训练的用户，这些用户具有广泛的需求（例如，领域科学家和小企业主）。项目成果将通过顶级会议和学术出版物传播，并纳入新课程。该项目还将支持不同的研究生、本科生和高中生。该项目重点关注四个基本的、相互关联的知识提取和发现任务，即，文本分类、短语挖掘、命名实体识别和关系提取。遵循极弱的监督范式，该项目将开发一系列新颖的方法，包括（1）用于多元和单元（新兴）短语的无监督短语标记方法，（2）只能采用最流行的文本分类方法（例如，前50%）类名作为输入以发现新类（即，新的类不是由用户明确定义的），并为所有的类建立一个分类器;（3）命名实体识别方法，它可以采用一些流行的实体类型和感兴趣的提及来识别（新兴）实体提及相同/相似类型;以及（4）一种关系抽取方法，它可以利用一些流行的关系类型和感兴趣的元组来发现语义相似的关系，相关的元组。所有这些方法，通过设计，将是不可知的领域和语言，只需要在特定的领域和语言的预训练神经语言模型的可用性。这个奖项反映了NSF的法定使命，并已被认为是值得的支持，通过评估使用基金会的智力价值和更广泛的影响审查标准。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Jingbo Shang其他文献

Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path

迈向 Web 挖掘中的零样本关系提取：具有相对 XML 路径的多模式方法

DOI：
10.48550/arxiv.2305.13805
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
Zilong Wang;Jingbo Shang
通讯作者：
Jingbo Shang

Less than One-shot: Named Entity Recognition via Extremely Weak Supervision

不到一次：通过极弱监督进行命名实体识别

DOI：
发表时间：
2023
期刊：
Conference on Empirical Methods in Natural Language Processing
影响因子：
0
作者：
Letian Peng;Zihan Wang;Jingbo Shang
通讯作者：
Jingbo Shang

Involvement of poly(ADP-ribose) polymerase-1 in development of spinal cord injury in Chinese individuals: a Chinese clinical study

聚（ADP-核糖）聚合酶-1 参与中国人脊髓损伤的发生：一项中国临床研究

DOI：
发表时间：
2017
期刊：
Drug Design, Development and Therapy
影响因子：
0
作者：
Qingyang Meng;Guang;Renbo Li;Jing;Wei Zhou;Hong;Bo Chen;Li Jiang;Jingbo Shang
通讯作者：
Jingbo Shang

AI-native Memory: A Pathway from LLMs Towards AGI

AI 原生内存：从法学硕士迈向 AGI 的途径

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Jingbo Shang;Zai Zheng;Xiang Ying;Felix Tao;Mindverse Team
通讯作者：
Mindverse Team

CubeNet: Multi-Facet Hierarchical Heterogeneous Network Construction, Analysis, and Mining

CubeNet：多方面分层异构网络构建、分析和挖掘

DOI：
发表时间：
2019
期刊：
arXiv.org
影响因子：
0
作者：
Carl Yang;Dai Teng;Siyang Liu;Sayantan Basu;Jieyu Zhang;Jiaming Shen;Chao Zhang;Jingbo Shang;Lance M. Kaplan;Timothy Harratty;Jiawei Han
通讯作者：
Jiawei Han