权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Bayesian Deep Reinforcement Learning

贝叶斯深度强化学习

基本信息

批准号：
2243850
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2243850
关键词：
Bayesian Deep Reinforcement Learning

项目摘要

Brief description of the context of the research including potential impactDeep reinforcement learning has become ubiquitous for learning control policies in challenging environments such as robotic control, Go-playing and autonomous driving. However, standard approaches are often sample inefficient, unstable, and use ad-hoc tricks that are not theoretically well justified. This project looks at deriving principled new objectives and algorithms for deep reinforcement learning through a Bayesian lens, and new interpretations of existing reinforcement learning algorithms.Aims and Objectives- Extending existing methods in the Bayesian Reinforcement Learning via meta-learning in the Bayes Adaptive MDP Framework to more complex environments.- Studying the connection between deep reinforcement learning and probabilistic inference and using this to derive a principled objective for reinforcement learning. We aim to bring theoretical justification for existing objectives such as mean-squared Bellman error which may not appropriately reflect the geometry of the function space.- Scaling related methodologies such as Bayesian optimisation to novel settings.Novelty of the research methodology- Probabilistic treatment of reinforcement learning- Novel and newly proposed frameworks for reinforcement learning such as BAMDPs and RL as inferenceAlignment to EPSRC's strategies and research areas (which EPSRC research area the project relates to) Further information on the areas can be found on http://www.epsrc.ac.uk/research/ourportfolio/researchareas/- Artificial intelligence technologies- Robotics- Statistics and applied probabilityAny companies or collaborators involvedNone

研究背景的简要描述，包括潜在的影响深度强化学习已经成为无处不在的学习控制策略在具有挑战性的环境，如机器人控制，围棋和自动驾驶。然而，标准的方法通常是样本效率低，不稳定，并使用ad-hoc技巧，理论上没有很好的理由。该项目着眼于通过贝叶斯透镜推导深度强化学习的原则性新目标和算法，以及对现有强化学习算法的新解释。目的和目标-通过贝叶斯自适应MDP框架中的元学习将贝叶斯强化学习中的现有方法扩展到更复杂的环境。研究深度强化学习和概率推理之间的联系，并利用它来推导强化学习的原则目标。我们的目标是为现有的目标，如均方贝尔曼误差，这可能不适当地反映函数空间的几何结构带来理论上的理由。扩展相关的方法，如贝叶斯优化到新的设置。研究方法的新奇-强化学习的概率处理-强化学习的新颖和新提出的框架，如BAMDPs和RL作为推理与EPSRC的战略和研究领域（该项目与EPSRC的研究领域有关）保持一致有关这些领域的更多信息可以在http://www.epsrc.ac.uk/research/ourportfolio/researchareas/-上找到人工智能技术-机器人-统计和应用概率任何参与的公司或合作者无