权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

MIMIc: Multimodal Imitation Learning in MultI-Agent Environments

MIMIc：多代理环境中的多模式模仿学习

基本信息

批准号：
EP/T000783/1
负责人：
Varuna De Silva
金额：
$ 32.99万
依托单位：
Loughborough University
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2019
资助国家：
英国
起止时间：
2019 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FT000783%2F1
关键词：
MIMIc Multimodal Imitation Learning MultI

项目摘要

In UK, we are not allowed to drive a vehicle until we are 17. It is because, driving is a complex and safety critical activity that requires many advanced cognitive skills like recognition of possible threats, anticipation of behavior of other road users and agile reaction to emerging situations. Think about a football player making decisions on field. A good player can sense the opportunities, through anticipating what other players will do, and select an action that will increase the odds of scoring. It takes a long time for humans to develop these advanced cognitive skills, to become an expert at such complex real-world tasks. Artificial Intelligence has made significant progress during the last decade, demonstrated by breakthroughs in cancer detection, computers beating 'Go' masters and intelligent robotics. However, if AI is to live up to its science fictional promises to assist humanity or even supersede human intelligence, it should at least be equipped with cognitive skills such as those possessed by humans. This project aims to develop ground breaking algorithms that equip autonomous systems with human like cognitive skills required to thrive in real world environments.We are focused on applications that require autonomous agents (e.g. Robot or Driverless car) to interact with multiple intelligent agents in the environment to accomplish a task (known as Multi-Agent Environments: MAEs). Such applications require an agent to anticipate the behaviour of other agents and to select the most appropriate course of actions. Equipping agents with such autonomous decision-making capability is known as policy learning. Compared to policy learning in single agent domains (teaching a robot to walk or a computer to play a video game), the recent progress of policy learning in MAEs has been quite modest. This is due to multiple reasons: 1)Due to agent actions the environment is dynamic 2)multi-agent policy learning suffers from a theoretical limitation known as curse of dimensionality (CoD) 3)Utility functions that capture agent objectives are difficult to define 4)there is a significant lack of adequate multi-agent datasets that allow meaningful research. This project proposes to undertake research in to policy learning in MAEs, by addressing the above limitations. Our unique approach to policy learning in MAEs is motivated by how humans thrive in similar settings. Firstly, we perceive the world through multiple senses, (i.e. vision, audition, touch) enabling a rich perception of the world. Secondly, when acting in a MAE, humans do not pay attention to all the stimuli but only to key stimuli e.g. when a football player is attacking the ball, the player pays attention only to the teammates capable of effecting a goal and the key defenders. Finally, the learning paradigm we employ known as imitation learning is an emerging methodology to learn by observing experts, which is a productive approach that we use to learn new skills. Accordingly, we propose to learn realistic policies in MAEs through imitation learning by leveraging multimodal data fusion and selective-attention modelling. Multimodal data fusion allows to capture high dimensional context of the real world and selective attention model allows for allaying the issue of CoD. We have been provided a unique multimodal multi-agent dataset and access to state-of-the-art facilities to capture data, by an elite football club facilitating this ambitious research project.The project outputs will be subjectively validated as a tool to answer "what-if" questions related to game play in football assisting coaching staff to visualize speculative game strategies, and as a computational benchmark to quantify cognitive skills of football players. The planned impact activities will ensure the project will leave a legacy in AI development benefiting UK PLC through significant contribution in multiple high growth areas, such as driverless vehicles, video gaming, and assistive robots.

在英国，17岁之前不允许开车。这是因为，驾驶是一项复杂的、对安全至关重要的活动，需要许多高级认知技能，如识别潜在威胁、预测其他道路使用者的行为以及对新出现的情况做出灵活反应。想一想一个足球运动员在球场上做决定。一个好的球员可以通过预测其他球员会做什么来感知机会，并选择一个可以增加得分几率的动作。人类需要很长时间来发展这些高级认知技能，才能成为如此复杂的现实世界任务的专家。人工智能在过去十年中取得了重大进展，表现在癌症检测、计算机击败围棋大师和智能机器人方面的突破。然而，如果人工智能要兑现其科学虚构的承诺，帮助人类，甚至取代人类的智能，它至少应该配备人类拥有的认知技能。该项目旨在开发开创性的算法，使自主系统具备在现实世界环境中茁壮成长所需的类似人类的认知技能。我们专注于需要自主代理(例如机器人或无人驾驶汽车)与环境中的多个智能代理交互以完成任务的应用(称为多代理环境：MAES)。这类应用要求代理预测其他代理的行为，并选择最合适的操作过程。为代理配备这种自主决策能力称为策略学习。与单智能体领域中的策略学习(教机器人行走或计算机玩视频游戏)相比，MAES中的策略学习最近的进展相当温和。这是由于多个原因：1)由于代理行为，环境是动态的；2)多代理策略学习受到称为维度灾难(CoD)的理论限制；3)很难定义捕获代理目标的效用函数；4)严重缺乏足够的多代理数据集来进行有意义的研究。本项目建议通过解决上述限制，开展关于在MAES中进行政策学习的研究。我们在MAES中学习政策的独特方法是受到人类在类似环境中茁壮成长的激励。首先，我们通过多种感官(即视觉、听觉、触觉)感知世界，使我们能够对世界有丰富的感知。其次，当在MAE中行动时，人类并不关注所有的刺激，而只关注关键的刺激。例如，当足球运动员进攻时，球员只关注能够实现进球的队友和关键的后卫。最后，我们采用的被称为模仿学习的学习范式是一种通过观察专家来学习的新兴方法，这是我们用来学习新技能的一种富有成效的方法。因此，我们建议通过模仿学习，利用多通道数据融合和选择性注意建模来学习MAES中的现实政策。多通道数据融合可以捕捉真实世界的高维背景，而选择性注意模型可以缓解CoD问题。一家精英足球俱乐部为这项雄心勃勃的研究项目提供了独特的多模式多主体数据集和使用最先进的设施来获取数据的权限。项目输出将被主观验证为一种工具，用于回答与足球比赛相关的“假设”问题，帮助教练组将投机游戏策略可视化，并作为量化足球运动员认知技能的计算基准。计划中的影响活动将确保该项目将在人工智能开发方面留下遗产，通过在多个高增长领域做出重大贡献，使英国PLC受益，如无人驾驶汽车、视频游戏和辅助机器人。