Low-Resource Text Understanding
低资源文本理解
基本信息
- 批准号:RGPIN-2021-03115
- 负责人:
- 金额:$ 2.48万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2021
- 资助国家:加拿大
- 起止时间:2021-01-01 至 2022-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Natural language understanding (NLU) aims at reading texts formed in natural languages, determining the meaning of each element (e.g., words, sentences, paragraphs), and making inferences based on these texts. It is critical to various applications, such as question answering (QA) and dialogue systems. State-of-the-art methods (e.g., BERT, GPT-3) for NLU applications are mostly based on deep neural networks. However, these methods have three major drawbacks: i) They are data-hungry and computational-intensive. A lot of training data and computational resources are needed to train such models. The training datasets for question answering are especially difficult and time-consuming to build. ii) They often fail at simple questions due to a lack of commonsense knowledge. iii) They are not transferable and lack explainability. The goal of this research program is to create a low-resource, knowledge-empowered, and logic-enhanced NLU system, and apply it to question answering. It will help to reduce the human efforts in dataset construction, as well as make QA systems be more knowledge-aware, explainable, and transferable. Such properties are especially critical for medical question answering, where the training dataset is extremely hard to construct, a lot of domain knowledge is required to understand medical text, and giving explanations of the model predictions to humans is essential as an incorrect result in health-care can be costly and dangerous. The techniques that will be developed in this research program are also essential to NLU applications other than QA. Our recent papers, published at WWW 2020 and SIGMOD 2020, propose efficient approaches for automatic question generation and web-scale ontology construction from unlabeled text and web user search logs. We will extend our previous research and further explore the following directions: i) Dataset construction. We plan to automatically generate large-scale and high-quality question-answer pairs from unlabeled text and knowledge graphs in a fully controlled manner to reduce human efforts. We will also evaluate the data quality and learn an efficient data curriculum to improve training efficiency. ii) Knowledge expansion. Instead of constructing a knowledge graph from scratch, we focus on ontology expansion to expand an existing ontology with newly discovered concepts or entities to capture the emerging knowledge in the world and keep the ontology dynamically updated. The generated ontology can serve as an input for both question generation and question answering. iii) Logic enhancement. We will design neural-symbolic QA models and integrate them with external knowledge so that we can improve both the logical reasoning ability and the transferability of QA systems. This program will contribute to Canada's lead in artificial intelligence, improve the understanding of natural language, and has a significant impact on real-world applications such as AI for health-care.
自然语言理解(NLU)旨在阅读以自然语言形成的文本,确定每个元素的含义(例如,单词、句子、段落),并根据这些文本进行推理。它对各种应用程序至关重要,例如问答(QA)和对话系统。最先进的方法(例如,BERT,GPT-3)的NLU应用程序主要基于深度神经网络。然而,这些方法有三个主要缺点:i)它们是数据饥饿和计算密集型的。需要大量的训练数据和计算资源来训练这样的模型。用于问答的训练数据集的构建特别困难和耗时。(2)由于缺乏常识,他们经常在简单的问题上失败。(三)不可转让,缺乏解释性。该研究计划的目标是创建一个低资源,知识授权和逻辑增强的NLU系统,并将其应用于问答。它将有助于减少数据集构建中的人工工作,并使QA系统更具知识感知性,可解释性和可转移性。这些属性对于医学问题回答尤其重要,因为训练数据集非常难以构建,需要大量的领域知识来理解医学文本,并且向人类解释模型预测是必不可少的,因为医疗保健中的错误结果可能是昂贵和危险的。将在这个研究计划中开发的技术也是必不可少的NLU应用程序以外的QA。我们最近的论文发表在WWW 2020和SIGMOD 2020上,提出了从未标记的文本和Web用户搜索日志自动生成问题和Web规模本体构建的有效方法。我们将扩展我们以前的研究,并进一步探索以下方向:i)数据集构建。我们计划以完全受控的方式从未标记的文本和知识图中自动生成大规模和高质量的问答对,以减少人工劳动。我们还将评估数据质量,并学习有效的数据课程,以提高培训效率。(二)知识拓展。我们不是从头开始构建知识图,而是专注于本体扩展,用新发现的概念或实体扩展现有本体,以捕获世界上新兴的知识,并保持本体动态更新。生成的本体可以作为问题生成和问题回答的输入。(三)逻辑增强。我们将设计神经-符号问答模型,并将其与外部知识相结合,以提高问答系统的逻辑推理能力和可移植性。该计划将有助于加拿大在人工智能领域的领先地位,提高对自然语言的理解,并对医疗保健人工智能等现实世界的应用产生重大影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Liu, Bang其他文献
The m7G Modification Level and Immune Infiltration Characteristics in Patients with COVID-19.
- DOI:
10.2147/jmdh.s385050 - 发表时间:
2022 - 期刊:
- 影响因子:3.3
- 作者:
Lu, Lingling;Zheng, Jiaolong;Liu, Bang;Wu, Haicong;Huang, Jiaofeng;Wu, Liqing;Li, Dongliang - 通讯作者:
Li, Dongliang
Development of a colloidal gold immunochromatographic strip assay for simple and fast detection of human α-lactalbumin in genetically modified cow milk
- DOI:
10.3168/jds.2015-9919 - 发表时间:
2016-03-01 - 期刊:
- 影响因子:3.5
- 作者:
Tao, Chenyu;Zhang, Qingde;Liu, Bang - 通讯作者:
Liu, Bang
Molecular characterization, chromosomal localization, expression profile and association analysis with carcass traits of the porcine dickkopf homolog1 gene
猪 dickkopf 同源 1 基因的分子特征、染色体定位、表达谱及其与胴体性状的关联分析
- DOI:
10.1007/s11033-010-0313-x - 发表时间:
2011-03 - 期刊:
- 影响因子:2.8
- 作者:
Liu, Chuxin;Zhai, Shanli;Gao, Hui;Liu, Bang - 通讯作者:
Liu, Bang
A dual signal amplification strategy combining thermally initiated SI-RAFT polymerization and DNA-templated silver nanoparticles for electrochemical determination of DNA
结合热引发 SI-RAFT 聚合和 DNA 模板银纳米粒子的双信号放大策略,用于 DNA 的电化学测定
- DOI:
10.1007/s00604-019-3912-9 - 发表时间:
2020-01-01 - 期刊:
- 影响因子:5.7
- 作者:
Liu, Bang;Sun, Haobo;Zhang, Xueji - 通讯作者:
Zhang, Xueji
Species Identification of Fox-, Mink-, Dog-, and Rabbit-Derived Ingredients by Multiplex PCR and Real-Time PCR Assay
- DOI:
10.1007/s12010-017-2621-2 - 发表时间:
2018-05-01 - 期刊:
- 影响因子:3
- 作者:
Wu, Qingqing;Xiang, Shengnan;Liu, Bang - 通讯作者:
Liu, Bang
Liu, Bang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Liu, Bang', 18)}}的其他基金
Low-Resource Text Understanding
低资源文本理解
- 批准号:
RGPIN-2021-03115 - 财政年份:2022
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Grants Program - Individual
Low-Resource Text Understanding
低资源文本理解
- 批准号:
DGECR-2021-00316 - 财政年份:2021
- 资助金额:
$ 2.48万 - 项目类别:
Discovery Launch Supplement
相似海外基金
Collaborative Research: LTREB: The importance of resource availability, acquisition, and mobilization to the evolution of life history trade-offs in a variable environment.
合作研究:LTREB:资源可用性、获取和动员对于可变环境中生命史权衡演变的重要性。
- 批准号:
2338394 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
Postdoctoral Fellowship: STEMEdIPRF: Resource Use as a Mediator of Sociodemographic Disparities in Student Success
博士后奖学金:STEMEdIPRF:资源利用作为学生成功中社会人口差异的中介
- 批准号:
2327314 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Standard Grant
AUC-GRANTED: Advancing Transformation of the Research Enterprise through Shared Resource Support Model for Collective Impact and Synergistic Effect.
AUC 授予:通过共享资源支持模型实现集体影响和协同效应,推进研究企业转型。
- 批准号:
2341110 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Cooperative Agreement
GOALI: Understanding granulation using microbial resource management for the broader application of granular technology
目标:利用微生物资源管理了解颗粒化,以实现颗粒技术的更广泛应用
- 批准号:
2227366 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Standard Grant
Pulmonary rehabilitation delivered in low resource settings for people with chronic respiratory disease: a 3-arm assessor-blind implementation trial
在资源匮乏的环境中为慢性呼吸道疾病患者提供肺康复:一项三臂评估者盲法实施试验
- 批准号:
MR/Y004809/1 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Research Grant
The Whitehall II study: A core resource for ageing research
Whitehall II 研究:老龄化研究的核心资源
- 批准号:
MR/Y014154/1 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Research Grant
CAREER: Leveraging Data Science & Policy to Promote Sustainable Development Via Resource Recovery
职业:利用数据科学
- 批准号:
2339025 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
CAREER: Integrated and end-to-end machine learning pipeline for edge-enabled IoT systems: a resource-aware and QoS-aware perspective
职业:边缘物联网系统的集成端到端机器学习管道:资源感知和 QoS 感知的视角
- 批准号:
2340075 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Continuing Grant
RaMP: Woods to Water (W2W) for Training the Next Generation of Ecologists and Natural Resource Managers
RaMP:用于培训下一代生态学家和自然资源管理者的森林到水 (W2W)
- 批准号:
2319669 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Standard Grant
Developing Intercultural Competence through Empathy: Construction of a Video Interview Resource of Japanese Immigrants
通过同理心培养跨文化能力:日本移民视频采访资源的构建
- 批准号:
24K16149 - 财政年份:2024
- 资助金额:
$ 2.48万 - 项目类别:
Grant-in-Aid for Early-Career Scientists