Reward Design for Safe Reinforcement Learning
安全强化学习的奖励设计
基本信息
- 批准号:2872672
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2023
- 资助国家:英国
- 起止时间:2023 至 无数据
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
In my DPhil, I intend to focus on the safe development of autonomous systems: algorithms that will be deployed in ways that change their environment and have to make sequences of decisions. One popular paradigm for creating decision-making agents is reinforcement learning (RL). Training an RL agent involves two stages: (1) designing the reward signal used to 'score' behaviour and (2) using that reward signal to train a high-scoring agent. Much previous research has focussed on the challenges of training an agent to get a high reward. However, the problem of specifying a reward that captures exactly what designers want is extremely challenging - especially in complex, real-world environments. If the reward function is misspecified, competent optimisers can learn to behave in unpredictable and undesirable ways.In recent years, reward learning has become a popular way to specify rewards in complicated environments. For example, ChatGPT uses a reward model trained on human labels. These reward models are only approximately accurate to the designers' intentions, and models may learn to exploit errors in the reward model to get rewards for undesirable actions. Forming a better understanding of how ChatGPT's reward models' inaccuracies influence its behaviour may be an important step to avoiding unsafe or antisocial behaviour.I want to further develop the theory of reward function design to create safe decision-making systems. My aims and objectives are as follows:1. To develop the theory of how agents fail when their reward functions are misspecified. For example, we can study ways to softly optimise an imperfect reward function to avoid unsafe behaviour. Alternatively, we can try to derive bounds on the error in the performance of a model in terms of the error in a reward model.2. To develop the theory of ways to design safer or more accurately specified reward functions. We can investigate whether some reward misspecification leads to more benign behaviours than others or find ways to improve reward learning methods.3. To investigate alternative training methods that side-step the need for a reward function. One such method is cooperative inverse reinforcement learning, which asks agents to model their uncertainty about their goals and to ask questions when they are uncertain. Another method might be training agents using goal-conditioning. The novelty of this research direction is the focus on the design of the reward rather than on the training process and the safety rather than the competence of agents. When RL has historically been applied in small or toy environments, the complexities of reward design were obscured relative to the challenges of learning to score a high reward. I instead aim to abstract away learning to score a high reward, by asking: if agents were very competent at doing what reward them for doing, how do we reward them for the right behaviours? I intend to develop previous work from the OxCAV group on reward theory, such as in impact regularisation, reward gaming and Goodhart's Law. This project falls within the EPSRC Artificial Intelligence Technologies research area.
在我的博士学位中,我打算专注于自主系统的安全开发:算法将以改变其环境的方式部署,并且必须做出一系列决策。创建决策代理的一个流行范例是强化学习(RL)。训练RL代理包括两个阶段:(1)设计用于“评分”行为的奖励信号;(2)使用奖励信号训练高分代理。之前的许多研究都集中在训练代理人获得高奖励的挑战上。然而,明确一个能够准确捕捉设计师想要的奖励是一个极具挑战性的问题——特别是在复杂的现实世界环境中。如果奖励功能被错误指定,有能力的优化者就会学会以不可预测和不受欢迎的方式行事。近年来,奖励学习已成为在复杂环境中指定奖励的一种流行方法。例如,ChatGPT使用在人类标签上训练的奖励模型。这些奖励模型只能大致准确地反映设计者的意图,并且模型可能学会利用奖励模型中的错误来为不受欢迎的行为获得奖励。更好地理解ChatGPT奖励模型的不准确性如何影响其行为,可能是避免不安全或反社会行为的重要一步。我想进一步发展奖励函数设计理论来创建安全的决策系统。我的目的和目标如下:发展当代理人的奖励函数被错误指定时他们是如何失败的理论。例如,我们可以研究如何温和优化不完美的奖励函数,以避免不安全的行为。或者,我们可以尝试根据奖励模型中的误差推导出模型性能中误差的界限。发展设计更安全或更精确指定奖励函数的方法的理论。我们可以研究一些奖励错误是否会导致比其他行为更良性的行为,或者找到改进奖励学习方法的方法。研究避开奖励功能的替代训练方法。其中一种方法是合作逆强化学习,它要求代理对其目标的不确定性进行建模,并在不确定时提出问题。另一种方法可能是使用目标条件反射训练代理。该研究方向的新颖之处在于关注奖励的设计而不是训练过程,关注代理的安全性而不是能力。当强化学习被应用于小型或玩具环境时,奖励设计的复杂性相对于学习获得高奖励的挑战来说是模糊的。相反,我的目标是抽象出获得高奖励的学习,我提出这样的问题:如果代理人非常有能力做奖励他们做的事情,我们如何奖励他们正确的行为?我打算继续发展OxCAV团队之前关于奖励理论的研究成果,如影响规格化、奖励游戏和古德哈特定律。该项目属于EPSRC人工智能技术研究领域。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似国自然基金
Applications of AI in Market Design
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研 究基金项目
基于“Design-Build-Test”循环策略的新型紫色杆菌素组合生物合成研究
- 批准号:
- 批准年份:2021
- 资助金额:0.0 万元
- 项目类别:省市级项目
在噪声和约束条件下的unitary design的理论研究
- 批准号:12147123
- 批准年份:2021
- 资助金额:18 万元
- 项目类别:专项基金项目
相似海外基金
M2DESCO - Computational Multimode Modelling Enabled Design of Safe & Sustainable Multi-Component High-Entropy Coatings
M2DESCO - 计算多模式建模支持安全设计
- 批准号:
10096988 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
PINK - Provision of Integrated Computational Approaches for Addressing New Markets Goals for the Introduction of Safe-and-Sustainable-by-Design Chemicals and Materials
PINK - 提供综合计算方法来解决引入安全和可持续设计化学品和材料的新市场目标
- 批准号:
10097944 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
Safe and Sustainable by Design framework for the next generation of Chemicals and Materials
下一代化学品和材料的安全和可持续设计框架
- 批准号:
10110559 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
SiToLub - Simulation Tools For The Design Of Safe And Sustainable Lubricants
SiToLub - 用于设计安全和可持续润滑剂的仿真工具
- 批准号:
10107545 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
Computational Multi-Models Enabled Design of Safe & Sustainable Multi-Component High-Entropy Coatings (M2DESCO)
计算多模型支持安全设计
- 批准号:
10110861 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
Simulation Tools for the design of safe and sustainable Lubricants
用于设计安全和可持续润滑剂的仿真工具
- 批准号:
10101483 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
SUNRISE: Safe and sUstainable by desigN: IntegRated approaches for Impact aSsessment of advanced matErials
SUNRISE:安全且可持续的设计:先进材料影响评估的综合方法
- 批准号:
10103630 - 财政年份:2024
- 资助金额:
-- - 项目类别:
EU-Funded
Protecting Pigs From Enzootic Pneumonia: Rational Design Of Safe Attenuated Vaccines.
保护猪免受地方性肺炎:安全减毒疫苗的合理设计。
- 批准号:
BB/X017540/1 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Research Grant
Collaborative Research: SLES: Bridging offline design and online adaptation in safe learning-enabled systems
协作研究:SLES:在安全的学习系统中桥接离线设计和在线适应
- 批准号:
2331880 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Standard Grant
ENTRUST - ENsuring Secure and Safe CMD Design with Zero TRUST Principles
ENTRUST - 以零信任原则确保安全可靠的 CMD 设计
- 批准号:
10063996 - 财政年份:2023
- 资助金额:
-- - 项目类别:
EU-Funded














{{item.name}}会员




