Open-Ended Discovery of Skill Hierarchies in Artificial Intelligence

人工智能技能层次结构的开放式发现

基本信息

  • 批准号:
    2278914
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2019
  • 资助国家:
    英国
  • 起止时间:
    2019 至 无数据
  • 项目状态:
    已结题

项目摘要

People solve complex tasks every day by decomposing them into smaller sub-tasks. For instance, the task of making a cup of tea can be decomposed into the sub-tasks of boiling the kettle, adding sugar, adding a tea-bag, grasping the cup, and so on. These sub-problems can themselves be decomposed into even smaller sub-problems, all the way down to the individual muscle movements involved - forming a hierarchy of skills useful for solving the problem.Of course, learning how to make a cup of tea at the scale of muscle movements would be an unreasonably large computational undertaking - much of our problem-solving ability is attributed to our ability to discover and plan using hierarchically organised higher-level behaviours. Planning and learning a sequence of a few high-level behaviours is clearly less computationally expensive than planning and learning a sequence of perhaps millions of primitive actions.Two key open research questions are how such useful skills should be characterised, and how an artificially intelligent agent should go about autonomously discovering them. It is these questions that we hope, at least in part, to address during this research project.We frame this research within the well-developed framework of Reinforcement Learning (RL), which concerns itself broadly with how artificially intelligent agents should learn optimal behavioural policies through interaction with their environments. Many RL methods, even those considered state-of-the-art, operate using primitive actions - they are still making cups of tea operating on the scale of muscle movements, as it were. The branch of RL which considers higher-level behaviours taken over varying timesteps is known as Hierarchical Reinforcement Learning (HRL), in reference to how skills can be organised hierarchically.Explicitly, the main objective of this research project is to develop an HRL algorithm, or set of HRL algorithms, which endow artificially intelligent agents with the ability to discover a hierarchy of useful high-level behaviours through interaction with their environment.There are several desirable properties that the algorithm(s) developed over the course of this project should possess. Firstly, the algorithms should be developmental, with higher-level, more complex skills being constructed hierarchically from lower-level ones as time goes on. Secondly, the algorithms should be domain-independent to ensure their applicability to many types of problem. Thirdly, it would be a desirable outcome if the algorithms developed performed well in tasks which are currently considered difficult (e.g. "hard exploration" problems such as the game of Montezuma's Revenge).These desirable properties stem partly from various shortcomings in current HRL methods. For instance, many existing HRL methods are applicable only in discrete domains, those with small state-spaces, or otherwise would not scale well to larger domains or those with continuous state-spaces. This limits their applicability to many of the interesting, complex problems that are ultimately of interest to RL.The benefits of developing such algorithms would be wide-ranging - allowing reinforcement learning to be applied to larger, more complex problems to which current method simply do not scale well.
人们每天通过将复杂的任务分解成更小的子任务来解决它们。例如,泡一杯茶的任务可以分解成煮壶、加糖、加茶包、抓杯子等子任务。这些子问题本身可以分解成更小的子问题,一直到所涉及的单个肌肉运动——形成一个有助于解决问题的技能层次。当然,学习如何在肌肉运动的规模上泡一杯茶将是一项不合理的庞大计算任务——我们解决问题的能力很大程度上归功于我们使用分层组织的高级行为来发现和计划的能力。计划和学习一些高级行为的序列显然比计划和学习可能包含数百万个原始动作的序列的计算成本要低。两个关键的开放研究问题是,这些有用的技能应该如何表征,以及人工智能代理应该如何自主地发现它们。正是这些问题,我们希望,至少部分地,在这个研究项目中解决。我们在完善的强化学习(RL)框架内构建了这项研究,该框架广泛关注人工智能代理如何通过与其环境的交互学习最佳行为策略。许多强化学习方法,即使是那些被认为是最先进的方法,都是用原始的动作来操作的——它们仍然是在用肌肉运动的规模来泡茶。强化学习的分支考虑了在不同时间步上采取的高级行为,被称为分层强化学习(HRL),指的是如何分层组织技能。明确地说,本研究项目的主要目标是开发一种HRL算法,或一组HRL算法,使人工智能代理能够通过与其环境的交互发现有用的高级行为层次结构。在本项目过程中开发的算法应该具有几个理想的属性。首先,算法应该是发展性的,随着时间的推移,更高级、更复杂的技能会从低级技能分层次地构建出来。其次,算法应该是领域无关的,以确保其适用于许多类型的问题。第三,如果所开发的算法在目前被认为困难的任务中表现良好(例如:“困难的探索”问题,如《蒙特祖玛的复仇》游戏)。这些理想的性质部分源于当前HRL方法的各种缺点。例如,许多现有的HRL方法仅适用于具有小状态空间的离散域,或者不能很好地扩展到具有连续状态空间的较大域。这限制了它们在许多有趣的、复杂的问题上的适用性,而这些问题最终是RL感兴趣的。开发这种算法的好处将是广泛的——允许强化学习应用于更大、更复杂的问题,而目前的方法根本无法很好地扩展这些问题。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

Internet-administered, low-intensity cognitive behavioral therapy for parents of children treated for cancer: A feasibility trial (ENGAGE).
针对癌症儿童父母的互联网管理、低强度认知行为疗法:可行性试验 (ENGAGE)。
  • DOI:
    10.1002/cam4.5377
  • 发表时间:
    2023-03
  • 期刊:
  • 影响因子:
    4
  • 作者:
  • 通讯作者:
Differences in child and adolescent exposure to unhealthy food and beverage advertising on television in a self-regulatory environment.
在自我监管的环境中,儿童和青少年在电视上接触不健康食品和饮料广告的情况存在差异。
  • DOI:
    10.1186/s12889-023-15027-w
  • 发表时间:
    2023-03-23
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
  • 通讯作者:
The association between rheumatoid arthritis and reduced estimated cardiorespiratory fitness is mediated by physical symptoms and negative emotions: a cross-sectional study.
类风湿性关节炎与估计心肺健康降低之间的关联是由身体症状和负面情绪介导的:一项横断面研究。
  • DOI:
    10.1007/s10067-023-06584-x
  • 发表时间:
    2023-07
  • 期刊:
  • 影响因子:
    3.4
  • 作者:
  • 通讯作者:
ElasticBLAST: accelerating sequence search via cloud computing.
ElasticBLAST:通过云计算加速序列搜索。
  • DOI:
    10.1186/s12859-023-05245-9
  • 发表时间:
    2023-03-26
  • 期刊:
  • 影响因子:
    3
  • 作者:
  • 通讯作者:
Amplified EQCM-D detection of extracellular vesicles using 2D gold nanostructured arrays fabricated by block copolymer self-assembly.
使用通过嵌段共聚物自组装制造的 2D 金纳米结构阵列放大 EQCM-D 检测细胞外囊泡。
  • DOI:
    10.1039/d2nh00424k
  • 发表时间:
    2023-03-27
  • 期刊:
  • 影响因子:
    9.7
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
  • 批准号:
    2879438
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
  • 批准号:
    2890513
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
CDT year 1 so TBC in Oct 2024
CDT 第 1 年,预计 2024 年 10 月
  • 批准号:
    2879865
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似海外基金

EAGER: Co-Designing a Cognitive Teaching Assistant to Support Evidence-Based Instruction in Open-Ended Learning Environments
EAGER:共同设计认知助教,支持开放式学习环境中的循证教学
  • 批准号:
    2327708
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
Hybrid Human Artificial Collective Intelligence in Open-Ended Decision Making
开放式决策中的混合人类人工智能集体智能
  • 批准号:
    10037991
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    EU-Funded
The theory and practice of 'trans-imperial history': towards an open-ended framework of research
“跨帝国史”的理论与实践:迈向开放式研究框架
  • 批准号:
    22H00690
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
Enabling rich, open-ended human-robot interaction through robust, advanced multimodal perceptual capabilities for high-level reasoning
通过强大、先进的多模态感知能力进行高级推理,实现丰富、开放式的人机交互
  • 批准号:
    RGPIN-2019-06047
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Towards Open-ended Reinforcement Learning using Synthetic Environment Generation
使用合成环境生成实现开放式强化学习
  • 批准号:
    2711309
  • 财政年份:
    2022
  • 资助金额:
    --
  • 项目类别:
    Studentship
Supporting open-ended play for wellbeing in adulthood through interactive installations using generative systems
通过使用生成系统的互动装置,支持开放式游戏,促进成年后的福祉
  • 批准号:
    2598279
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Studentship
Enabling rich, open-ended human-robot interaction through robust, advanced multimodal perceptual capabilities for high-level reasoning
通过强大、先进的多模态感知能力进行高级推理,实现丰富、开放式的人机交互
  • 批准号:
    RGPIN-2019-06047
  • 财政年份:
    2021
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Enabling rich, open-ended human-robot interaction through robust, advanced multimodal perceptual capabilities for high-level reasoning
通过强大、先进的多模态感知能力进行高级推理,实现丰富、开放式的人机交互
  • 批准号:
    RGPIN-2019-06047
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Discovery Grants Program - Individual
Beam power upgrade by single-ended rf acceleration cavity
通过单端射频加速腔升级波束功率
  • 批准号:
    20H00166
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Grant-in-Aid for Scientific Research (A)
Experimental Investigation of Open-Ended Pipe Piles Subjected to Axial and Lateral Loads
轴向和横向荷载下开口管桩的试验研究
  • 批准号:
    2028672
  • 财政年份:
    2020
  • 资助金额:
    --
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了