权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Data- and model-based Reinforcement Learning for Performance, Requirements, and Multi-Agent setups

针对性能、需求和多代理设置的基于数据和模型的强化学习

基本信息

批准号：
2242815
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2242815
关键词：
Data model based Reinforcement Learning

项目摘要

Brief description of the context of the research including potential impact:Despite many recent successes in the field of AI, AI systems can still only solve a narrow set of tasks in a restricted environment. Reinforcement learning (RL) is a machine learning technique that holds promise for achieving generality because almost all real-world cognitive tasks can be cast as a reinforcement learning problem. This is one where an agent is coupled with an environment and gets reward according to which action it takes in each situation. The agent must decide on a policy of actions to maximise its expected cumulative future reward. Two key shortcomings limiting the applications of current RL systems are reward misspecification and inefficient sampling. Reward misspecification refers to the fact that it is difficult for a user to codify exactly what they want in an objective function. This can result in negative side effects or 'reward hacking' where an agent learns to exploit a loophole in the objective function to gain reward for undesired behaviours. RL's inefficient sampling refers to the fact that RL agents must currently acquire vast amounts of experience before reaching any degree of competence at a task. Inverse Reinforcement Learning (IRL) and Active Learning try to address these shortcomings. IRL seeks to determine the objective function given observations of optimal behaviour. Several approaches to IRL have recently been put forward including Maximum entropy IRL, Cooperative IRL and Bayesian IRL. The idea behind Active Learning is that if one prioritises training on data, trajectories, or samples that would result in the greatest learning effect, then one can significantly increase the sample efficiency of learning systems (including RL agents or IRL algorithms). By addressing shortcomings in existing RL systems, I will be advancing and expediting the project of creating safe and scalable RL systems to tackle real world problems and benefit humanity. Aims and Objectives:- Develop novel approaches to combat reward misspecification and sampling inefficiencies.- Extend existing frameworks to multi-agent settings.Novelty of the research methodology:AI safety is a nascent field which aims to address potential near-, medium-, and long-term risks of AI technologies. Current AI concerns include social media, algorithmic bias, security, and privacy, and as the applications of AI become more powerful and pervasive, it is clear that research progress should be seen through a safety lens. With an eye on safety, we hope to improve upon existing RL approaches and extend existing frameworks to multi-agent settings.Alignment to EPSRC's strategies and research areas: - Artificial Intelligence technologies- Statistics and applied probability- Theoretical Computer ScienceCompanies or collaborators involved: None

简要描述研究背景，包括潜在影响：尽管最近在人工智能领域取得了许多成功，但人工智能系统仍然只能在有限的环境中解决一组狭窄的任务。强化学习（RL）是一种机器学习技术，它有望实现通用性，因为几乎所有现实世界的认知任务都可以被视为强化学习问题。在这种情况下，代理与环境相结合，并根据其在每种情况下采取的行动获得奖励。代理必须决定一个行动策略，以最大化其预期的累积未来回报。限制当前强化学习系统应用的两个主要缺点是奖励规格错误和采样效率低下。奖励错误规范指的是用户很难准确地在目标函数中编写他们想要的东西。这可能会导致消极的副作用或“奖励黑客”，即代理学会利用目标函数中的漏洞来为不希望的行为获得奖励。强化学习的低效率采样指的是，强化学习代理目前必须获得大量的经验，才能达到任何程度的任务能力。逆强化学习（IRL）和主动学习试图解决这些缺点。IRL寻求在给定最佳行为观察的情况下确定目标函数。近年来，人们提出了几种IRL方法，包括最大熵IRL、合作IRL和贝叶斯IRL。主动学习背后的想法是，如果一个人优先考虑能够产生最大学习效果的数据、轨迹或样本的训练，那么他就可以显著提高学习系统（包括RL代理或IRL算法）的样本效率。通过解决现有强化学习系统的缺点，我将推进和加快创建安全和可扩展的强化学习系统的项目，以解决现实世界的问题，造福人类。目标和目标：-开发新的方法来解决奖励错误和抽样效率低下的问题。-将现有框架扩展到多代理设置。研究方法的新颖性：人工智能安全是一个新兴领域，旨在解决人工智能技术潜在的近期、中期和长期风险。目前对人工智能的担忧包括社交媒体、算法偏见、安全和隐私，随着人工智能的应用变得越来越强大和普遍，很明显，应该从安全的角度来看待研究进展。考虑到安全性，我们希望改进现有的强化学习方法，并将现有框架扩展到多智能体设置。与EPSRC的战略和研究领域保持一致：-人工智能技术-统计和应用概率-理论计算机科学参与公司或合作者：无