权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

TBCMulti-Agent Reinforcement Learning for Assistive Robots

TBC辅助机器人多智能体强化学习

基本信息

批准号：
2901369
负责人：
金额：
--
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2901369
关键词：
TBCMulti Agent Reinforcement Learning Assistive

项目摘要

This project uses reinforcement learning for robotic control to aid disabled humans in various everyday assistive tasks. These assistive tasks include washing, dressing, eating and drinking. The project uses the assistive gym environment, which simulates these tasks to be as close as possible to physically realistic settings (Erickson et al., 2019). The project is particularly challenging as it has a large action space and long sequences of actions. Additionally, the project emphasises human-robot interaction, where the robot needs to learn policies that can satisfy the preferences of humans and anticipate the cooperative behaviour of humans. Currently, the project is replicating and expanding on baseline algorithms used in existing research. In particular, the project will explore the application of decision transformers to the specific environment, an architectural implementation that still needs to be implemented. This is guided by the success of short-term memory architectures that are better at learning sequential dependencies in the environment (Glaese et al., 2022). Transformers are a promising alternative to LSTM, being more efficient at learning long-term dependencies, allowing the robot to choose better action early on in the sequence to improve the entire trajectory. Future work will focus on challenging restrictive assumptions prevalent in past research. In particular, the project will address two assumptions: that humans will optimally cooperate with the robot and that human preferences are given ex-ante and do not dynamically change. To remove these assumptions, the project will implement reinforcement learning from human feedback, accommodating a more complex and diverse set of preferences that need not be predefined. Techniques like inverse reinforcement learning from human data can simulate sub-optimal human cooperation. These enhancements aim to develop algorithms that learn more adept policies for real-world applications.

该项目将强化学习用于机器人控制，以帮助残疾人完成各种日常辅助任务。这些辅助任务包括洗衣、穿衣、吃饭和喝水。该项目使用辅助健身房环境，该环境模拟这些任务尽可能接近物理现实设置（Erickson等人，2019年）。该项目特别具有挑战性，因为它有很大的行动空间和很长的行动序列。此外，该项目强调人机交互，机器人需要学习能够满足人类偏好的策略，并预测人类的合作行为。目前，该项目正在复制和扩展现有研究中使用的基线算法。特别是，该项目将探索决策转换器在特定环境中的应用，这是一个仍需实现的架构实现。这是由短期记忆体系结构的成功所引导的，短期记忆体系结构更善于学习环境中的顺序依赖性（Glaese et al.，2022年）。Transformer是LSTM的一个很有前途的替代方案，它在学习长期依赖关系方面更有效，允许机器人在序列的早期选择更好的动作，以改善整个轨迹。未来的工作将侧重于挑战过去研究中普遍存在的限制性假设。特别是，该项目将解决两个假设：人类将与机器人进行最佳合作，以及人类的偏好是事先给定的，不会动态变化。为了消除这些假设，该项目将从人类反馈中实施强化学习，以适应更复杂和多样化的偏好集，这些偏好不需要预先定义。像从人类数据中进行反向强化学习这样的技术可以模拟次优的人类合作。这些增强旨在开发算法，为现实世界的应用程序学习更熟练的策略。