权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Sim-to-Real Deep Reinforcement Learning for legged robot locomotion with vision-based high dimensional data

使用基于视觉的高维数据进行腿式机器人运动的模拟到真实深度强化学习

基本信息

批准号：
1950742
负责人：
金额：
--
依托单位：
University of Bristol
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2017
资助国家：
英国
起止时间：
2017 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-1950742
关键词：
Sim Real Deep Reinforcement Learning

项目摘要

This PhD will explore methods that allow legged robots to improve and adapt its gaits to various terrains. For the MSc the physics simulator was pre-programmed with the environment terrain and robot dimensions. Parameters such as friction coefficients and weight distributions were roughly estimated. However, for the PhD the robot will build a model of itself in a 3D environment using a combination of vision and depth sensing combined with orientation sensing and robot babbling. This will allow the robot to adjust the parameters of the simulated environment which may increase adaptability and reduce the reality gap.The aim is to contribute a novel method that allows any type of legged robot to manoeuvre from high dimensional input data. This method aims to address the adaptability problems with explicitly programmed algorithms whist also addressing the reality gap issues with PPO reinforcement learning. The input data will be from RGBD, joint parameters, orientation and tactile sensing. Note PPO performs exceptionally well with high dimension inputs and these will be required in order to identify the complexities of the real world.Aims and objectives:1. Build a legged robot capable of sensing its environment (i.e. RGBD, orientation and tactile sensors)2. Self-model Agent - Allow the robot to perform robot babbling and use the orientation, tactile and joint position sensors to self-model the agent3. Model Environment - Allow the robot to scan the room with RGBD sensors to model its terrain and identify its target location (i.e. a ball in a room)4. Train the simulated robot with reinforcement learning in the 3D world to determine a policy to reach the target location5. Deploy the trained policy on the physical robot 6. Measure the 'reality gap' between the robot's performance in simulation and the physical world.7. Adapt robot babbling and environment modelling accordingly.8. TRL 4 quadruped robot that can traverse previously unseen terrains/scenarios toward a goal

这个博士将探索方法，让腿式机器人，以改善和适应其步态，以各种地形。对于理学硕士来说，物理模拟器是用环境、地形和机器人尺寸预先编程的。对摩擦系数和重量分布等参数进行了粗略估计。然而，对于博士来说，机器人将使用视觉和深度传感结合方向传感和机器人牙牙学语在3D环境中构建自己的模型。这将允许机器人调整参数的模拟环境，这可能会增加适应性和减少现实gap.The的目的是有助于一种新的方法，允许任何类型的腿式机器人操纵从高维输入数据。这种方法旨在解决显式编程算法的适应性问题，同时也解决了PPO强化学习的现实差距问题。输入数据将来自RGBD、关节参数、方向和触觉传感。注意PPO在高维输入下表现得非常好，为了识别真实的世界的复杂性，需要这些输入。建立一个有腿机器人能够感知其环境（即RGBD，方向和触觉传感器）2.自模型代理-允许机器人执行机器人咿呀学语，并使用方向，触觉和关节位置传感器来自模型代理3。模型环境-允许机器人用RGBD传感器扫描房间，以对其地形进行建模并识别其目标位置（即房间中的球）4.在3D世界中用强化学习训练模拟机器人，以确定到达目标位置的策略5。在物理机器人上部署训练的策略6.测量机器人在模拟和物理世界中的表现之间的“现实差距”。7.适应机器人咿呀学语和相应的环境建模. TRL 4四足机器人，可以穿越以前看不见的地形/场景走向目标