Scaling Unsupervised Environment Design
扩展无监督环境设计
基本信息
- 批准号:2888076
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Reinforcement learning (RL) is a subfield of machine learning where an agent (e.g. an autonomous vehicle) learns from acting in an environment (e.g. a real road / simulation of a road). Despite making great progress in solving complex video games (Atari, Go, StarCraft), it has not yet been successfully applied to many real world problems. The root cause of this is the inability of RL agents to generalise to unseen scenarios. Specifically, an RL agent trained in a simulation doesn't transfer well when deployed in the real world, due to the inevitable inaccuracies of simulation (note that, due to the large volume of training data needed and potential dangers, it is often impractical to train an agent in the real world).Recent pioneering work has demonstrated significant empirical benefits to generalisation by training a teacher that learns to propose high-quality scenarios (e.g. road layouts) for the agent to train on, mirroring results from supervised learning that have shown the importance of data quality in generalisation. A limitation to this work is that the teacher has to learn from a sparse and noisy signal, resulting in low sample efficiency and necessitating large computational resources, meaning it has only been successfully applied to very simple problems. To reduce signal noise, I have proposed methods encouraging the teacher to maintain a diverse set of scenarios using metrics for approximated surprise, ease of discrimination and distance in a learned latent space. Furthermore, I propose a novel data augmentation method, whereby scenarios are decomposed into a set of 'sub-scenarios', expanding the training data with minimal computational cost. Finally, the current state of the art method trains the teacher by applying random perturbations. I suggest a method for targeted perturbations by constantly approximating the agent's regret (the difference between how well it did at the task and how well an optimal agent would have done) and applying perturbations where this is lowest. All these techniques aim to improve the efficiency of the overall process, reducing the resources needed and allowing this powerful technique to be opened up to more complex domains, benefiting real world applications like autonomous driving. It should be noted that, while I use autonomous driving as a running example, the methods being developed will be generalisable to any RL problem and will be evaluated over a diverse range of environments.This project falls within the EPSRC Artificial intelligence technologies research area.
强化学习 (RL) 是机器学习的一个子领域,其中代理(例如自动驾驶车辆)通过在环境(例如真实道路/模拟道路)中的行为来学习。尽管在解决复杂视频游戏(雅达利、围棋、星际争霸)方面取得了巨大进展,但它尚未成功应用于许多现实世界的问题。其根本原因是强化学习代理无法泛化到未见过的场景。具体来说,由于模拟不可避免的不准确,在模拟中训练的强化学习智能体在部署到现实世界中时不能很好地迁移(请注意,由于需要大量的训练数据和潜在的危险,在现实世界中训练智能体通常是不切实际的)。最近的开创性工作已经证明,通过培训学习提出高质量场景(例如道路布局)的教师,对泛化具有显着的经验效益 为代理进行训练,反映监督学习的结果,这些结果表明了数据质量在泛化中的重要性。这项工作的一个限制是,教师必须从稀疏且嘈杂的信号中学习,导致样本效率低下,并且需要大量的计算资源,这意味着它仅成功应用于非常简单的问题。为了减少信号噪声,我提出了一些方法,鼓励教师使用近似惊喜、易于辨别的度量以及学习的潜在空间中的距离来维护一组不同的场景。此外,我提出了一种新颖的数据增强方法,将场景分解为一组“子场景”,以最小的计算成本扩展训练数据。最后,当前最先进的方法通过应用随机扰动来训练教师。我建议一种有针对性的扰动方法,通过不断地近似代理的遗憾(它在任务中的表现与最佳代理的表现之间的差异)并在其最低的地方应用扰动。所有这些技术都旨在提高整个过程的效率,减少所需的资源,并使这种强大的技术能够扩展到更复杂的领域,从而有利于自动驾驶等现实世界的应用。应该指出的是,虽然我使用自动驾驶作为运行示例,但正在开发的方法将适用于任何 RL 问题,并将在各种环境中进行评估。该项目属于 EPSRC 人工智能技术研究领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
Internet-administered, low-intensity cognitive behavioral therapy for parents of children treated for cancer: A feasibility trial (ENGAGE).
针对癌症儿童父母的互联网管理、低强度认知行为疗法:可行性试验 (ENGAGE)。
- DOI:
10.1002/cam4.5377 - 发表时间:
2023-03 - 期刊:
- 影响因子:4
- 作者:
- 通讯作者:
Differences in child and adolescent exposure to unhealthy food and beverage advertising on television in a self-regulatory environment.
在自我监管的环境中,儿童和青少年在电视上接触不健康食品和饮料广告的情况存在差异。
- DOI:
10.1186/s12889-023-15027-w - 发表时间:
2023-03-23 - 期刊:
- 影响因子:4.5
- 作者:
- 通讯作者:
The association between rheumatoid arthritis and reduced estimated cardiorespiratory fitness is mediated by physical symptoms and negative emotions: a cross-sectional study.
类风湿性关节炎与估计心肺健康降低之间的关联是由身体症状和负面情绪介导的:一项横断面研究。
- DOI:
10.1007/s10067-023-06584-x - 发表时间:
2023-07 - 期刊:
- 影响因子:3.4
- 作者:
- 通讯作者:
ElasticBLAST: accelerating sequence search via cloud computing.
ElasticBLAST:通过云计算加速序列搜索。
- DOI:
10.1186/s12859-023-05245-9 - 发表时间:
2023-03-26 - 期刊:
- 影响因子:3
- 作者:
- 通讯作者:
Amplified EQCM-D detection of extracellular vesicles using 2D gold nanostructured arrays fabricated by block copolymer self-assembly.
使用通过嵌段共聚物自组装制造的 2D 金纳米结构阵列放大 EQCM-D 检测细胞外囊泡。
- DOI:
10.1039/d2nh00424k - 发表时间:
2023-03-27 - 期刊:
- 影响因子:9.7
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似海外基金
Data-driven phenotyping of central disorders of hypersomnolence with unsupervised clustering: toward more reliable diagnostic criteria
无监督聚类的数据驱动的中枢性嗜睡症表型分析:寻求更可靠的诊断标准
- 批准号:
481046 - 财政年份:2023
- 资助金额:
-- - 项目类别:
CRCNS Research Proposal: A Unified Framework for Unsupervised Sparse-to-dense Brain Image Generation and Neural Circuit Reconstruction
CRCNS 研究提案:无监督稀疏到密集脑图像生成和神经回路重建的统一框架
- 批准号:
2309073 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Continuing Grant
FRR: Collaborative Research: Unsupervised Active Learning for Aquatic Robot Perception and Control
FRR:协作研究:用于水生机器人感知和控制的无监督主动学习
- 批准号:
2237577 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: Principled Unsupervised Learning via Minimum Volume Polytopic Embedding
职业:通过最小体积多面嵌入进行有原则的无监督学习
- 批准号:
2237640 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Continuing Grant
Knockoff Feature Selection Techniques for Robust Inference in Supervised and Unsupervised Learning
监督和无监督学习中鲁棒推理的仿冒特征选择技术
- 批准号:
2310955 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
Unsupervised machine learning classification of ADHD subtype by urinary levels of tryptophan and monoamine neurotransmitters
根据尿色氨酸和单胺神经递质水平对 ADHD 亚型进行无监督机器学习分类
- 批准号:
23K12814 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for Early-Career Scientists
Study of Human Statistical Biases on Unsupervised Parsing and Language Modeling
无监督句法分析和语言建模的人类统计偏差研究
- 批准号:
23KJ0565 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Grant-in-Aid for JSPS Fellows
Unsupervised Annotation of Complex 3D BioMedical Data.
复杂 3D 生物医学数据的无监督注释。
- 批准号:
2882348 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Studentship
Using synthetic data and unsupervised learning methods for malware detection
使用合成数据和无监督学习方法进行恶意软件检测
- 批准号:
10076857 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Collaborative R&D
Unsupervised Deep Photon-Counting Computed Tomography Reconstruction for Human Extremity Imaging
用于人体肢体成像的无监督深度光子计数计算机断层扫描重建
- 批准号:
10718303 - 财政年份:2023
- 资助金额:
-- - 项目类别: