Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
基本信息
- 批准号:RGPIN-2020-04465
- 负责人:
- 金额:$ 2.11万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Natural language generation (NLG) is an important field of artificial intelligence. NLG aims to synthesize natural language text (e.g., sentences) in a variety of tasks, including text summarization, paraphrase generation, and dialogue systems. State-of-the-art NLG systems are based on deep neural networks, which typically synthesize a sentence by predicting one word at a time in an autoregressive fashion. Such approaches significantly outperform traditional rule/template-based NLG in terms of expressiveness and naturalness.
However, existing neural NLG has two major drawbacks: 1) Neural networks are usually data-hungry. For example, millions of pairs of parallel sentences are required to train a neural translation system. 2) These methods suffer from the "error accumulation" problem, i.e., the quality of text could drop drastically as the generation proceeds, due to the autoregressive nature (e.g., left-to-right generation).
The long-term goal of this proposed research is to investigate unsupervised approaches to text generation. In our previous studies, we tackled this problem by sampling from the probabilistic continuous latent space of a variational autoencoder. More recently, we proposed a novel Metropolis-Hastings (MH) sampler that directly samples a sentence from the discrete word space. In this way, text generation would be more data-efficient, and could be easily adapted to various real-world applications.
Based on our previous work, this proposed research program would systematically explore unsupervised text generation by stochastic search, with the following short-term goals: 1) Development of stochastic searching algorithms. Despite our MH sampler, I plan to explore stochastic search algorithms, such as simulated annealing and genetic algorithms, because most NLG tasks are better formulated as a discrete optimization problem than sampling. Here, we would also design searching operations (e.g., word/phrase editing) suitable for text generation. They perform edits in a distributed way over the entire sentence, so our approach does not suffer from the "error accumulation" problem. 2) Applications of unsupervised text generation. Our searching framework provides a flexible way of text generation, because we can easily manipulate the searching objective function and also because such approach does not require parallel data for training. I plan to address a few important generation tasks in NLP, including text summarization, sentence simplification, and style-transfer text generation. 3) Combining searching and learning for text generation. I would like to integrate our search algorithms into a learnable model. On the one hand, a parametric learning model could not only smooth the manually defined searching objective, but also improve inference efficiency for sentence generation. On the other hand, the search procedure could also help train a learning machine, especially in the reinforcement learning setting.
自然语言生成是人工智能的一个重要领域。NLG旨在合成自然语言文本(例如,句子)的各种任务,包括文本摘要,释义生成,和对话系统。最先进的NLG系统基于深度神经网络,通常通过自回归方式一次预测一个单词来合成句子。这种方法在表达性和自然性方面明显优于传统的基于规则/模板的NLG。
然而,现有的神经NLG有两个主要缺点:1)神经网络通常是数据饥饿的。例如,需要数百万对平行句子来训练神经翻译系统。2)这些方法存在“误差累积”问题,即,随着生成的进行,由于自回归性质(例如,从左到右生成)。
这项研究的长期目标是研究无监督的文本生成方法。在我们以前的研究中,我们通过从变分自编码器的概率连续潜在空间采样来解决这个问题。最近,我们提出了一种新的Metropolis-Hastings(MH)采样器,直接从离散词空间中采样一个句子。通过这种方式,文本生成将更有效地利用数据,并且可以很容易地适应各种现实世界的应用程序。
基于我们以前的工作,本研究计划将系统地探索通过随机搜索的无监督文本生成,其短期目标如下:1)开发随机搜索算法。尽管我们的MH采样器,我计划探索随机搜索算法,如模拟退火和遗传算法,因为大多数NLG任务更好地制定为一个离散的优化问题比采样。在这里,我们还将设计搜索操作(例如,字/短语编辑),适合于文本生成。它们以分布式的方式对整个句子进行编辑,因此我们的方法不会受到“错误累积”问题的影响。2)无监督文本生成的应用。我们的搜索框架提供了一种灵活的文本生成方式,因为我们可以很容易地操纵搜索目标函数,也因为这种方法不需要并行数据进行训练。我计划解决NLP中几个重要的生成任务,包括文本摘要、句子简化和风格转换文本生成。3)结合搜索和学习生成文本。我想把我们的搜索算法整合到一个可学习的模型中。一方面,参数化学习模型不仅可以平滑人工定义的搜索目标,而且可以提高句子生成的推理效率。另一方面,搜索过程也可以帮助训练学习机,特别是在强化学习环境中。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mou, Lili其他文献
Finding decision jumps in text classification
- DOI:
10.1016/j.neucom.2019.08.082 - 发表时间:
2020-01-02 - 期刊:
- 影响因子:6
- 作者:
Liu, Xianggen;Mou, Lili;Song, Sen - 通讯作者:
Song, Sen
Mou, Lili的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mou, Lili', 18)}}的其他基金
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
RGPIN-2020-04465 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
RGPIN-2020-04465 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
DGECR-2020-00267 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Launch Supplement
相似国自然基金
Neural Process模型的多样化高保真技术研究
- 批准号:62306326
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
RGPIN-2020-04465 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Generating Personalized Synthetic Speech for Progressive Dysarthria Using Severity-Appropriate Adaptation Strategies for Neural Text-to-Speech and Voice Conversion
使用神经文本到语音和语音转换的严重程度适当的适应策略为进行性构音障碍生成个性化合成语音
- 批准号:
10525903 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Targeted Neural Text Summarization of Electronic Medical Records to Improve Imaging Diagnostics
电子病历的定向神经文本摘要可改善影像诊断
- 批准号:
10696220 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Knowledge based neural question generation from text
从文本生成基于知识的神经问题
- 批准号:
560815-2020 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Alliance Grants
Generating Personalized Synthetic Speech for Progressive Dysarthria Using Severity-Appropriate Adaptation Strategies for Neural Text-to-Speech and Voice Conversion
使用神经文本到语音和语音转换的严重程度适当的适应策略为进行性构音障碍生成个性化合成语音
- 批准号:
10656540 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Targeted Neural Text Summarization of Electronic Medical Records to Improve Imaging Diagnostics
电子病历的定向神经文本摘要可改善影像诊断
- 批准号:
10443224 - 财政年份:2022
- 资助金额:
$ 2.11万 - 项目类别:
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
RGPIN-2020-04465 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Grants Program - Individual
Knowledge based neural question generation from text
从文本生成基于知识的神经问题
- 批准号:
560815-2020 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Alliance Grants
CAREER: Knowledge-Rich Neural Text Comprehension and Reasoning
职业:知识丰富的神经文本理解和推理
- 批准号:
2044660 - 财政年份:2021
- 资助金额:
$ 2.11万 - 项目类别:
Continuing Grant
Unsupervised Neural Text Generation by Stochastic Searching
通过随机搜索生成无监督神经文本
- 批准号:
DGECR-2020-00267 - 财政年份:2020
- 资助金额:
$ 2.11万 - 项目类别:
Discovery Launch Supplement