权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

AI and robotics - applying the maximum entropy framework to real-world robotics tasks

人工智能和机器人——将最大熵框架应用于现实世界的机器人任务

基本信息

批准号：
2745856
负责人：
金额：
--
依托单位：
University of Oxford
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2745856
关键词：
AI robotics applying maximum entropy

项目摘要

Context and impactIn recent years, robotics has shown increasing promise in moving outside of laboratories and into real-world tasks. Areas such as car manufacturing that require simple and repetitive motions have felt the impact of robotics for years, but the current challenge is to extend this reach into large, dynamic environments that involve interaction between humans and robots. The economic impact of intelligent autonomous systems, once deployed at scale, will be vast, allowing the work of one person to be leveraged many times over, and creating orders of magnitude of efficiency gains. One of the key components of these systems is control, which is where my research sits.Aims and objectivesThe control tasks relevant to real-world robotics broadly fall into two categories: locomotion, my focus, and manipulation. While locomotion over flat ground is fairly straightforward, things become much more difficult over rough terrain, requiring the use of extra sensory modes like vision to anticipate obstacles and act accordingly. One of the principal aims of my research is to increase the effective range of operation of state-of-the-art locomotion controllers. More complex forms of movement like climbing and jumping, performed robustly, are currently out of the reach of modern robotics systems. Extending these capabilities would greatly enhance the domain of autonomy of these systems, furthering real-world deployment. Novelty of the research methodologyOne of the key technologies powering state-of-the-art robotics is Deep Reinforcement learning (RL). This will be the technique I will primarily focus on for my research and will aim to extend its capabilities by addressing the following questions.The first of these is how we can make training reinforcement learning systems more stable and effective. RL algorithms commonly fall into locally optimal solutions that either partially solve a problem or 'games' the reward function in an unhelpful way - like an agent that doesn't move, to avoid penalties for collapsing but sidestepping the task we want it to accomplish. Additionally, the search for useful actions often involves highly unstable behaviour which when deployed on real-world systems can easily result in damage to the hardware, or more importantly, to nearby people. If we wish to one day have intelligent autonomous systems that can adapt to real-time changes in their environment, these problems must be solved. From my research, a promising solution to both of these issues is the maximum entropy framework (MaxEnt).Maximum entropy algorithms jointly optimise for the agent's reward function and the 'entropy' - which can be thought of as a measure of randomness - of the distribution of actions it takes. The benefits of this approach are as follows. Firstly it is a simple and robust solution to the exploration-exploitation trade-off. This is one of the key dilemmas in designing an RL system and can be thought of as balancing solutions we know are currently effective, and exploring other options for solutions that might be even more effective. Secondly, MaxEnt has a unique advantage over other methods in that it allows for multimodal solutions, meaning that if there are multiple equally valid ways of solving a problem, the agent can retain all of them, giving us greater flexibility.Alignment to EPSRC's strategies and research areasI believe this project falls squarely under the UKRI's AI and robotics theme, as robust control policies are one of the core challenges that need to be solved for our systems, and thus the UK, to be resilient and effective. Additionally, my research coincides with many of the strategic priorities like artificial intelligence, frontiers in engineering and technology, and possibly even more distant areas like transforming health and healthcare with the addition of robotic manipulators in surgical procedures.

背景和影响近年来，机器人技术在走出实验室进入现实世界的任务方面表现出越来越大的希望。汽车制造等需要简单重复动作的领域多年来一直感受到机器人技术的影响，但目前的挑战是将这种影响扩展到涉及人类与机器人互动的大型动态环境中。一旦大规模部署，智能自主系统的经济影响将是巨大的，允许一个人的工作被多次利用，并创造数量级的效率增益。这些系统的关键组成部分之一是控制，这是我的研究所在。目的和目标与现实世界的机器人相关的控制任务大致分为两类：运动，我的重点，和操纵。虽然在平坦的地面上移动是相当简单的，但在崎岖的地形上，事情变得更加困难，需要使用视觉等额外的感官模式来预测障碍物并采取相应的行动。我的研究的主要目的之一是增加最先进的运动控制器的有效操作范围。更复杂的运动形式，如攀爬和跳跃，表现强劲，目前是现代机器人系统所无法企及的。扩展这些功能将大大增强这些系统的自主性，促进实际部署。研究方法的新奇为最先进的机器人技术提供动力的关键技术之一是深度强化学习（RL）。这将是我在研究中主要关注的技术，并将通过解决以下问题来扩展其功能：第一个问题是如何使训练强化学习系统更加稳定和有效。RL算法通常会陷入局部最优解，要么部分解决问题，要么以一种无益的方式“游戏”奖励函数-就像一个不移动的代理，以避免崩溃的惩罚，但回避我们希望它完成的任务。此外，搜索有用的操作通常涉及高度不稳定的行为，当部署在现实世界的系统上时，很容易导致硬件损坏，更重要的是，对附近的人造成损害。如果我们希望有一天拥有能够适应环境实时变化的智能自治系统，这些问题必须得到解决。根据我的研究，这两个问题的一个有希望的解决方案是最大熵框架（MaxEnt）。最大熵算法联合优化代理的奖励函数和“熵”-可以被认为是随机性的度量-它所采取的行动的分布。这种方法的好处如下。首先，它是一个简单而强大的解决方案，探索开发权衡。这是设计强化学习系统的关键难题之一，可以认为是平衡我们已知的当前有效的解决方案，并探索其他可能更有效的解决方案。其次，MaxEnt与其他方法相比具有独特的优势，因为它允许多模式解决方案，这意味着如果有多个同样有效的解决问题的方法，智能体可以保留所有这些方法，为我们提供更大的灵活性。与EPSRC的战略和研究领域保持一致我相信这个项目福尔斯完全属于UKRI的人工智能和机器人主题，因为强有力的控制政策是我们的系统需要解决的核心挑战之一，因此英国要有弹性和有效性。此外，我的研究与许多战略重点相吻合，如人工智能，工程和技术前沿，甚至可能更遥远的领域，如通过在外科手术中添加机器人操纵器来改变健康和医疗保健。