Efficient Robotic Reinforcement Learning via Off-Policy and Meta-Learning

通过离策略和元学习实现高效的机器人强化学习

基本信息

  • 批准号:
    2285275
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2019
  • 资助国家:
    英国
  • 起止时间:
    2019 至 无数据
  • 项目状态:
    已结题

项目摘要

This project falls within the EPSRC Artificial Intelligence and Robotics research areas.Research ContextDeep reinforcement learning-based methods are increasingly researched approaches for robotics because of their promise to provide more flexible control policies with reduced manual engineering overhead. In contrast to traditional robotics methods, where the control policies are specified by highly specialized experts for each task separately, learning algorithms can acquire general behaviors from their own experience in the same way that many biological organisms do. Deep learning models are shown to generalize well when trained on diverse datasets, and the key to their success lies in their ability to learn millions of parameters from large amounts of training data. One of the major limitations of real-world robotic learning is that we cannot afford to collect large enough datasets for "ImageNet-scale" generalization within a single experiment.Potential ImpactWhile robots are becoming increasingly affordable to average consumers, the set of tasks they can carry out is limited due to the difficulty of designing robust control policies. Robots could reduce the human burden in many everyday tasks such as cooking at homes, elderly care at assisted-living communities, surgery at hospitals, or rescue operations in dangerous disaster zones. Algorithms that can learn and generalize efficiently are crucial to disseminating useful low-cost robots for wider audiences.Objectives and Research MethodologyFor reinforcement learning algorithms to evolve into practical methods for complex real-world tasks, we must design novel algorithms that allow us to get around the issue of data scarcity. One possible way is to better leverage existing historical data. Towards this goal, we propose to investigate how to (1) better utilize off-policy data, that is, the data collected outside of the specific robot experiment, and (2) meta-learn policies that can adapt to new tasks quickly.There is an abundance of previously collected robotic data available, which already provides a large and diverse experience for learning robotic methods. With the ability to incorporate this experience into reinforcement learning, we can get the policies to truly generalize across different objects, environments, scenes, and possibly even across different robots. For example, we could use the RoboNet dataset to improve the training of a single-task or multi-task reinforcement learning. We would define one or more tasks manually and relabel all the RoboNet data with these task rewards. We would then run an off-policy reinforcement algorithm, such as Soft-Actor Critic, on the large set of RoboNet data and a modest amount of new data for each task. This should allow policies to generalize and learn faster.Meta-reinforcement learning algorithms allow agents to rapidly adapt to new tasks by exploiting the structural similarities of previously collected experiences. Existing meta-learning algorithms operate mainly in a setting where all the experience is accessible to the learner in a single batch. More realistically, in the real world, the tasks encountered by agents are typically experienced in a sequential fashion, which is why we should extend the current meta-learning formulation to support such cases of streaming experiences. Another interesting direction would be to formulate versions of meta reinforcement learning where all the MDPs don't necessarily share the same state and action spaces. This would require developing new model architectures that can read in heterogeneous state spaces and output heterogeneous actions.
该项目属于EPSRC人工智能和机器人学研究领域。研究背景基于深度强化学习的方法是越来越多的机器人研究方法,因为它们承诺提供更灵活的控制策略,减少人工工程开销。与传统的机器人方法不同,传统的机器人方法由高度专业化的专家分别为每个任务指定控制策略,而学习算法可以像许多生物有机体一样,从自己的经验中获取一般行为。深度学习模型被证明在不同的数据集上训练时具有很好的泛化能力,其成功的关键在于它们能够从大量的训练数据中学习数百万个参数。现实世界机器人学习的主要限制之一是,我们无法在一次实验中收集足够大的数据集,以用于“ImageNet-Scale”的推广。潜在影响虽然机器人对普通消费者来说越来越负担得起,但由于设计稳健的控制策略的难度,它们可以执行的任务集是有限的。机器人可以在许多日常工作中减轻人类的负担,比如在家里做饭,在辅助生活社区照顾老人,在医院做手术,或者在危险的灾区进行救援行动。能够有效地学习和推广的算法对于向更广泛的受众传播有用的低成本机器人至关重要。目标和研究方法为了使强化学习算法演变成用于复杂现实世界任务的实用方法,我们必须设计新的算法,使我们能够绕过数据稀缺的问题。一种可能的方法是更好地利用现有的历史数据。为此,我们建议研究如何(1)更好地利用非策略数据,即在特定机器人实验之外收集的数据,以及(2)能够快速适应新任务的元学习策略。通过将这种体验整合到强化学习中,我们可以获得真正适用于不同对象、环境、场景,甚至可能适用于不同机器人的策略。例如,我们可以使用Robonet数据集来改进单任务或多任务强化学习的训练。我们将手动定义一个或多个任务,并使用这些任务奖励重新标记所有Robonet数据。然后,我们将在Robonet的大量数据和每个任务的少量新数据上运行非策略强化算法,如Soft-Actor Critic。这应该允许策略泛化和更快地学习。元强化学习算法允许代理通过利用以前收集的经验的结构相似性快速适应新任务。现有的元学习算法主要运行在学习者可以在单个批次中获得所有体验的环境中。更现实地说,在现实世界中,代理遇到的任务通常是以顺序的方式体验的,这就是为什么我们应该扩展当前的元学习公式来支持这种流体验的情况。另一个有趣的方向是制定不同版本的元强化学习,其中所有的MDP不必共享相同的状态和动作空间。这将需要开发能够读入异类状态空间和输出异类操作的新模型体系结构。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
生命分子工学・海洋生命工学研究室
生物分子工程/海洋生物技术实验室
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
  • 批准号:
    2879438
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
  • 批准号:
    2890513
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
  • 批准号:
    2879865
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似国自然基金

High-precision force-reflected bilateral teleoperation of multi-DOF hydraulic robotic manipulators
  • 批准号:
    52111530069
  • 批准年份:
    2021
  • 资助金额:
    10 万元
  • 项目类别:
    国际(地区)合作与交流项目

相似海外基金

Phase 2 - Effective and Integrated Chemical Free Robotic Milking
第 2 阶段 - 有效且集成的无化学品机器人挤奶
  • 批准号:
    10093094
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Collaborative R&D
CAREER: Designing Autonomous Battery-Free Robotic Sensors
职业:设计自主无电池机器人传感器
  • 批准号:
    2338736
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Continuing Grant
Revolutionary Rotors: A Robotic Flywheel Assembly Line
革命性转子:机器人飞轮装配线
  • 批准号:
    10098069
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Collaborative R&D
Dynamically Adaptive Prosthetic Limbs Enabled by Autonomous Soft Robotic Interfaces
由自主软机器人接口实现的动态自适应假肢
  • 批准号:
    10095028
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Collaborative R&D
RII Track-4: NSF: Enabling Synergistic Multi-Robot Cooperation for Mobile Manipulation Beyond Individual Robotic Capabilities
RII Track-4:NSF:实现协同多机器人合作,实现超越单个机器人能力的移动操作
  • 批准号:
    2327313
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
RII Track-4:NSF:Planetary Robotic Construction on the Moon and Mars Using 3D Printed Waterless Concrete
RII Track-4:NSF:使用 3D 打印无水混凝土在月球和火星上进行行星机器人施工
  • 批准号:
    2327469
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
MRI-guided Robotic-assisted Neurosurgery at 0.55T
0.55T MRI 引导机器人辅助神经外科手术
  • 批准号:
    2904728
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Studentship
NRI/Collaborative Research: Robotic Disassembly of High-Precision Electronic Devices
NRI/合作研究:高精度电子设备的机器人拆卸
  • 批准号:
    2422640
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Collaborative Research: Interaction-aware Planning and Control for Robotic Navigation in the Crowd
协作研究:人群中机器人导航的交互感知规划和控制
  • 批准号:
    2423131
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Researching the feasibility of enhancing the productivity of daffodil harvesting through the use of a daffodil collection robotic platform (Daffy)
研究利用水仙花采集机器人平台(Daffy)提高水仙花采收生产力的可行性
  • 批准号:
    10107691
  • 财政年份:
    2024
  • 资助金额:
    --
  • 项目类别:
    Launchpad
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了