Theory and Algorithms for Relation between Stochastic Control and Reinforcement Learning
随机控制与强化学习关系的理论和算法
基本信息
- 批准号:2741077
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:英国
- 项目类别:Studentship
- 财政年份:2022
- 资助国家:英国
- 起止时间:2022 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Reinforcement learning (RL) is an extremely active subfield in machine learning. Its famous real-world triumphs in the area of complex decision making problems in recent years include playing perfect-information board games such as Go and driving autonomous vehicles. The basic idea is for an algorithm to learn gradually near-best strategies to solve a problem in real time. The learning is based on trial-and-error for improving control policies. It requires specifying an explicit objective score function on a black-box environment, which incorporates the environmental responses to the control actions. However, such trial-and-error exploration is in many situations prohibitively costly. The exploration-exploitation dichotomy in RL is typically mitigated by adding an entropy-regularisation term that needs to be minimised as part of the objective function. The connection between entropy-regularized RL problems and the relaxed continuous-time stochastic control has been established recently. In particular, Linear-Quadratic (LQ) control problem has elegant solutions and is able to approximate more general problems. One of the objective is to design and implement algorithms for solving this problem, either LQ problem or general control problem, and provide corresponding convergence theorems. To this end, algorithms proposed by Szpruch et al. and Reisinger et al. are good examples and could be a starting point. In real-world problems, stochastic process with jumps is widely applied. Thus, another objective of this project is to consider more complex controlled dynamics, i.e. processes with jump, and see whether our algorithms could tackle this more complex problem. The martingale approach proposed by Hernandez-Hernandez et al. could be applied to this project. In this research, we will mainly study the theory of the problem. When the theory is relatively complete, we will try to develop corresponding algorithms and provide convergence theorems for them. With these theory and algorithms, we will proceed to the last objective of this project: applying the algorithms to different real-world problem, such as financial trading, autonomous driving and board games. Many applications of RL in finance are yet to be incorporated in the continuous-time relaxed stochastic control framework. It is envisaged that, during this PhD, theoretical advances will be applied to problems in quantitative finance (e.g. optimalexecution of portfolio transactions and the risk-management of derivative securities) with emphasis on real-worlds applicability of the algorithms.
强化学习(RL)是机器学习中一个非常活跃的子领域。近年来,它在复杂决策问题领域取得了著名的现实胜利,包括玩围棋等信息完美的棋盘游戏,以及驾驶自动驾驶汽车。其基本思想是让算法逐渐学习接近最佳的策略来实时解决问题。学习是基于改进控制策略的试错。它需要在黑盒环境中指定一个明确的客观评分函数,该函数将环境对控制动作的响应结合起来。然而,在许多情况下,这种尝试和错误的探索是非常昂贵的。强化学习中的探索-开发二分法通常通过添加一个熵正则化项来缓解,该项需要作为目标函数的一部分最小化。熵正则化强化学习问题与松弛连续时间随机控制之间的联系是最近才建立起来的。特别是线性二次(LQ)控制问题有优雅的解决方案,并且能够近似更一般的问题。其中一个目标是设计和实现解决这一问题的算法,无论是LQ问题还是一般控制问题,并提供相应的收敛定理。为此,Szpruch等人和Reisinger等人提出的算法就是很好的例子,可以作为一个起点。在实际问题中,具有跳变的随机过程得到了广泛的应用。因此,这个项目的另一个目标是考虑更复杂的受控动力学,即具有跳跃的过程,并看看我们的算法是否可以解决这个更复杂的问题。Hernandez-Hernandez等人提出的鞅方法可以应用于这个项目。在本研究中,我们将主要研究该问题的理论。当理论比较完备时,我们将尝试开发相应的算法,并为其提供收敛定理。有了这些理论和算法,我们将继续这个项目的最后一个目标:将算法应用于不同的现实世界问题,比如金融交易、自动驾驶和棋盘游戏。RL在金融中的许多应用尚未纳入连续时间放松随机控制框架。据设想,在这个博士学位期间,理论进步将应用于定量金融问题(例如,投资组合交易的最佳执行和衍生证券的风险管理),重点是算法的现实世界适用性。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
其他文献
吉治仁志 他: "トランスジェニックマウスによるTIMP-1の線維化促進機序"最新医学. 55. 1781-1787 (2000)
Hitoshi Yoshiji 等:“转基因小鼠中 TIMP-1 的促纤维化机制”现代医学 55. 1781-1787 (2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
LiDAR Implementations for Autonomous Vehicle Applications
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
吉治仁志 他: "イラスト医学&サイエンスシリーズ血管の分子医学"羊土社(渋谷正史編). 125 (2000)
Hitoshi Yoshiji 等人:“血管医学与科学系列分子医学图解”Yodosha(涉谷正志编辑)125(2000)。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
Effect of manidipine hydrochloride,a calcium antagonist,on isoproterenol-induced left ventricular hypertrophy: "Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,K.,Teragaki,M.,Iwao,H.and Yoshikawa,J." Jpn Circ J. 62(1). 47-52 (1998)
钙拮抗剂盐酸马尼地平对异丙肾上腺素引起的左心室肥厚的影响:“Yoshiyama,M.,Takeuchi,K.,Kim,S.,Hanatani,A.,Omura,T.,Toda,I.,Akioka,
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
- 通讯作者:
的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('', 18)}}的其他基金
An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
- 批准号:
2901954 - 财政年份:2028
- 资助金额:
-- - 项目类别:
Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
- 批准号:
2896097 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
- 批准号:
2780268 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
- 批准号:
2908918 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
- 批准号:
2908693 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
- 批准号:
2908917 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Assessment of new fatigue capable titanium alloys for aerospace applications
评估用于航空航天应用的新型抗疲劳钛合金
- 批准号:
2879438 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Developing a 3D printed skin model using a Dextran - Collagen hydrogel to analyse the cellular and epigenetic effects of interleukin-17 inhibitors in
使用右旋糖酐-胶原蛋白水凝胶开发 3D 打印皮肤模型,以分析白细胞介素 17 抑制剂的细胞和表观遗传效应
- 批准号:
2890513 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
- 批准号:
2876993 - 财政年份:2027
- 资助金额:
-- - 项目类别:
Studentship
相似海外基金
DMS-EPSRC: Asymptotic Analysis of Online Training Algorithms in Machine Learning: Recurrent, Graphical, and Deep Neural Networks
DMS-EPSRC:机器学习中在线训练算法的渐近分析:循环、图形和深度神经网络
- 批准号:
EP/Y029089/1 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Research Grant
CAREER: Blessing of Nonconvexity in Machine Learning - Landscape Analysis and Efficient Algorithms
职业:机器学习中非凸性的祝福 - 景观分析和高效算法
- 批准号:
2337776 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
CAREER: From Dynamic Algorithms to Fast Optimization and Back
职业:从动态算法到快速优化并返回
- 批准号:
2338816 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
CAREER: Structured Minimax Optimization: Theory, Algorithms, and Applications in Robust Learning
职业:结构化极小极大优化:稳健学习中的理论、算法和应用
- 批准号:
2338846 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
CRII: SaTC: Reliable Hardware Architectures Against Side-Channel Attacks for Post-Quantum Cryptographic Algorithms
CRII:SaTC:针对后量子密码算法的侧通道攻击的可靠硬件架构
- 批准号:
2348261 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
CRII: AF: The Impact of Knowledge on the Performance of Distributed Algorithms
CRII:AF:知识对分布式算法性能的影响
- 批准号:
2348346 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
CRII: CSR: From Bloom Filters to Noise Reduction Streaming Algorithms
CRII:CSR:从布隆过滤器到降噪流算法
- 批准号:
2348457 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
EAGER: Search-Accelerated Markov Chain Monte Carlo Algorithms for Bayesian Neural Networks and Trillion-Dimensional Problems
EAGER:贝叶斯神经网络和万亿维问题的搜索加速马尔可夫链蒙特卡罗算法
- 批准号:
2404989 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Standard Grant
CAREER: Efficient Algorithms for Modern Computer Architecture
职业:现代计算机架构的高效算法
- 批准号:
2339310 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant
CAREER: Improving Real-world Performance of AI Biosignal Algorithms
职业:提高人工智能生物信号算法的实际性能
- 批准号:
2339669 - 财政年份:2024
- 资助金额:
-- - 项目类别:
Continuing Grant