Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
基本信息
- 批准号:RGPIN-2019-06079
- 负责人:
- 金额:$ 2.04万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Natural agents, like animals, learn from a life-time of experience. Most artificial learning systems do not. Newborns begin life with a frenzy of learning: attempting to master their muscle twitches and make sense of their visual inputs. This knowledge is continuously reused and refined throughout life. Our current Artificial Intelligence (AI) systems are well-suited to problems with a clear cause and effect relationship between the system's decisions and the utility of those decisions. Swimming into a shark will cause a loss of life. Shooting an alien ship will increase the score. However, in problems where the consequences of a decision are significantly delayed, it is more difficult to learn this mapping. The most challenging and largely unsolved AI benchmark problems feature such delayed consequences. It is common practice for state-of-the-art systems to train for the equivalent of 30 days on each Atari game, and still achieve well-below human performance in games that feature delayed consequences.
One way to deal with the problem of delayed consequences is for the AI to construct its own understanding of how the world works, usually called a model of the world. A model encodes the regularities of the world. For example a model might encode: (1) when I am lined up with a shark and I decide to fire a torpedo, the shark will disappear, and (2) if I am standing on a platform and I decide to jump down, I will end up on the ground. Given access to a model of this form, an AI can mentally simulate future situations that would result from behaving in particular ways without actually interacting with the world. Just as a human can decide where they might end-up if the took a new path down to the river. We can imagine the outcome of taking this alternative path without physically doing it, and avoid unnecessary exploration unless we decide it is valuable to do so. Model-based mental simulation can dramatically improve the efficiency of learning.
The remaining question is how does the system decide how to best make use of mental simulation. People often decide to try out things they have never done before. We choose to engage in activities that are mentally and physically challenging, but not beyond our abilities. Humans are motivated by novelty, curiosity and knowledge seeking, and bored by things we already know about. Combining this idea with a model could allow an AI to simulate different ways of behaving, preferring those ways of behaving that result in reduction of uncertainty and the acquisition of new knowledge. With a model, the AI can generate its own internal feedback to focus its mental simulations. The objective of this research program is two fold: (1) to design new approaches for representing and learning models of the world, and (2) to integrate mechanisms that can guide mental simulations (planning) and exploration toward uncertainty and knowledge acquisition.
自然的代理人,像动物一样,从一生的经验中学习。大多数人工学习系统都没有。新生儿开始的生活充满了疯狂的学习:试图掌握他们的肌肉抽搐和理解他们的视觉输入。这些知识在一生中不断重复使用和完善。我们目前的人工智能(AI)系统非常适合解决系统决策与这些决策的效用之间存在明确因果关系的问题。游泳时撞上鲨鱼会造成生命损失。射击一艘外星飞船将增加分数。然而,在决策的结果被显著延迟的问题中,学习这种映射更困难。最具挑战性和基本上未解决的AI基准问题具有这种延迟后果。对于最先进的系统来说,通常的做法是在每个Atari游戏上训练相当于30天,并且在具有延迟后果的游戏中仍然达到远低于人类的表现。
处理延迟后果问题的一种方法是人工智能构建自己对世界如何运作的理解,通常称为世界模型。一个模型编码了世界的模式。例如,一个模型可能会编码:(1)当我和一条鲨鱼排成一行,我决定发射鱼雷时,鲨鱼会消失,(2)如果我站在平台上,我决定跳下去,我最终会在地面上。如果能够访问这种形式的模型,人工智能可以在精神上模拟未来的情况,这些情况将由特定的行为方式产生,而无需实际与世界进行交互。就像一个人可以决定他们可能会结束,如果他们采取了一个新的路径下到河。我们可以想象采取这种替代路径而不实际这样做的结果,并避免不必要的探索,除非我们认为这样做是有价值的。基于模型的心理模拟可以显著提高学习效率。
剩下的问题是系统如何决定如何最好地利用心理模拟。人们经常决定尝试他们以前从未做过的事情。我们选择从事那些在精神上和身体上具有挑战性的活动,但不会超出我们的能力。人类的动机是新奇、好奇和求知,而对我们已经知道的事情感到厌倦。将这个想法与模型相结合,可以让人工智能模拟不同的行为方式,更喜欢那些导致减少不确定性和获取新知识的行为方式。有了模型,人工智能可以生成自己的内部反馈,以集中其心理模拟。该研究计划的目标有两个方面:(1)设计新的方法来表示和学习世界的模型,以及(2)整合可以引导心理模拟(规划)和探索不确定性和知识获取的机制。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
White, Adam其他文献
Multi-timescale nexting in a reinforcement learning robot
- DOI:
10.1177/1059712313511648 - 发表时间:
2014-04-01 - 期刊:
- 影响因子:1.6
- 作者:
Modayil, Joseph;White, Adam;Sutton, Richard S. - 通讯作者:
Sutton, Richard S.
Questioning Anglocentrism in plural policing studies: Private security regulation in Belgium and the United Kingdom
- DOI:
10.1177/14773708211014853 - 发表时间:
2021-05-12 - 期刊:
- 影响因子:1.9
- 作者:
Leloup, Pieter;White, Adam - 通讯作者:
White, Adam
A Qualitative Exploration of Parents' Perceptions of Risk in Youth Contact Rugby.
- DOI:
10.3390/bs12120510 - 发表时间:
2022-12-14 - 期刊:
- 影响因子:2.6
- 作者:
Anderson, Eric;White, Adam;Hardwicke, Jack - 通讯作者:
Hardwicke, Jack
From eye-blinks to state construction: Diagnostic benchmarks for online representation learning.
- DOI:
10.1177/10597123221085039 - 发表时间:
2023-03 - 期刊:
- 影响因子:1.6
- 作者:
Rafiee, Banafsheh;Abbas, Zaheer;Ghiassian, Sina;Kumaraswamy, Raksha;Sutton, Richard S.;Ludvig, Elliot A.;White, Adam - 通讯作者:
White, Adam
Teachers' stories: physical education teachers' constructions and experiences of masculinity within secondary school physical education
- DOI:
10.1080/13573322.2015.1112779 - 发表时间:
2017-01-01 - 期刊:
- 影响因子:2.9
- 作者:
White, Adam;Hobson, Michael - 通讯作者:
Hobson, Michael
White, Adam的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('White, Adam', 18)}}的其他基金
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
DGECR-2019-00479 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Launch Supplement
Leveraging spectrally encoded beads for multiplexed nucleic acid detection
利用光谱编码珠进行多重核酸检测
- 批准号:
503082-2017 - 财政年份:2018
- 资助金额:
$ 2.04万 - 项目类别:
Postdoctoral Fellowships
Leveraging spectrally encoded beads for multiplexed nucleic acid detection
利用光谱编码珠进行多重核酸检测
- 批准号:
503082-2017 - 财政年份:2017
- 资助金额:
$ 2.04万 - 项目类别:
Postdoctoral Fellowships
Particle Size Analysis in Marine Sediments
海洋沉积物中的粒径分析
- 批准号:
516368-2017 - 财政年份:2017
- 资助金额:
$ 2.04万 - 项目类别:
University Undergraduate Student Research Awards
Particle Size Analysis in Marine Sediments
海洋沉积物中的粒径分析
- 批准号:
505971-2016 - 财政年份:2016
- 资助金额:
$ 2.04万 - 项目类别:
University Undergraduate Student Research Awards
Single cell gene expression analysis by microfluidic digital PCR
通过微流控数字PCR进行单细胞基因表达分析
- 批准号:
427647-2012 - 财政年份:2013
- 资助金额:
$ 2.04万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Single cell gene expression analysis by microfluidic digital PCR
通过微流控数字PCR进行单细胞基因表达分析
- 批准号:
427647-2012 - 财政年份:2012
- 资助金额:
$ 2.04万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
相似海外基金
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2022
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2021
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Model Theory and proof theory of probabilistic logic in propositional and modal team semantics
命题和模态团队语义中概率逻辑的模型理论和证明理论
- 批准号:
19F19797 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Grant-in-Aid for JSPS Fellows
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
RGPIN-2019-06079 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Grants Program - Individual
Proof-Theoretic Study of Doxastic and Epistemic Updates via Questions
通过问题进行信念和认知更新的证明理论研究
- 批准号:
19K12113 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
- 批准号:
DGECR-2019-00479 - 财政年份:2019
- 资助金额:
$ 2.04万 - 项目类别:
Discovery Launch Supplement
Proof-theoretic study of multi-agent interaction via many-dimensional and many-sorted logics
通过多维多分类逻辑进行多主体交互的证明理论研究
- 批准号:
15K21025 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
Empirical studies on development of inquisitive learning program in the field of astronomu to promote science literacy
天文学领域探究式学习计划提升科学素养的实证研究
- 批准号:
15K04407 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Optics and Photonics Training for Inquisitive eXperimentalists (OPTIX)
好奇实验家的光学和光子学培训 (OPTIX)
- 批准号:
1505919 - 财政年份:2015
- 资助金额:
$ 2.04万 - 项目类别:
Continuing Grant
The semantics and syntax of modal indefinites in Japanese
日语情态不定词的语义和语法
- 批准号:
26370450 - 财政年份:2014
- 资助金额:
$ 2.04万 - 项目类别:
Grant-in-Aid for Scientific Research (C)