权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Theory and Algorithms for Relation between Stochastic Control and Reinforcement Learning

随机控制与强化学习关系的理论和算法

基本信息

批准号：
2741077
负责人：
金额：
--
依托单位：
University of Warwick
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2741077
关键词：
Theory Algorithms Relation between Stochastic

项目摘要

Reinforcement learning (RL) is an extremely active subfield in machine learning. Its famous real-world triumphs in the area of complex decision making problems in recent years include playing perfect-information board games such as Go and driving autonomous vehicles. The basic idea is for an algorithm to learn gradually near-best strategies to solve a problem in real time. The learning is based on trial-and-error for improving control policies. It requires specifying an explicit objective score function on a black-box environment, which incorporates the environmental responses to the control actions. However, such trial-and-error exploration is in many situations prohibitively costly. The exploration-exploitation dichotomy in RL is typically mitigated by adding an entropy-regularisation term that needs to be minimised as part of the objective function. The connection between entropy-regularized RL problems and the relaxed continuous-time stochastic control has been established recently. In particular, Linear-Quadratic (LQ) control problem has elegant solutions and is able to approximate more general problems. One of the objective is to design and implement algorithms for solving this problem, either LQ problem or general control problem, and provide corresponding convergence theorems. To this end, algorithms proposed by Szpruch et al. and Reisinger et al. are good examples and could be a starting point. In real-world problems, stochastic process with jumps is widely applied. Thus, another objective of this project is to consider more complex controlled dynamics, i.e. processes with jump, and see whether our algorithms could tackle this more complex problem. The martingale approach proposed by Hernandez-Hernandez et al. could be applied to this project. In this research, we will mainly study the theory of the problem. When the theory is relatively complete, we will try to develop corresponding algorithms and provide convergence theorems for them. With these theory and algorithms, we will proceed to the last objective of this project: applying the algorithms to different real-world problem, such as financial trading, autonomous driving and board games. Many applications of RL in finance are yet to be incorporated in the continuous-time relaxed stochastic control framework. It is envisaged that, during this PhD, theoretical advances will be applied to problems in quantitative finance (e.g. optimalexecution of portfolio transactions and the risk-management of derivative securities) with emphasis on real-worlds applicability of the algorithms.

强化学习（RL）是机器学习中一个非常活跃的子领域。近年来，它在复杂决策问题领域取得了著名的现实胜利，包括玩围棋等信息完美的棋盘游戏，以及驾驶自动驾驶汽车。其基本思想是让算法逐渐学习接近最佳的策略来实时解决问题。学习是基于改进控制策略的试错。它需要在黑盒环境中指定一个明确的客观评分函数，该函数将环境对控制动作的响应结合起来。然而，在许多情况下，这种尝试和错误的探索是非常昂贵的。强化学习中的探索-开发二分法通常通过添加一个熵正则化项来缓解，该项需要作为目标函数的一部分最小化。熵正则化强化学习问题与松弛连续时间随机控制之间的联系是最近才建立起来的。特别是线性二次（LQ）控制问题有优雅的解决方案，并且能够近似更一般的问题。其中一个目标是设计和实现解决这一问题的算法，无论是LQ问题还是一般控制问题，并提供相应的收敛定理。为此，Szpruch等人和Reisinger等人提出的算法就是很好的例子，可以作为一个起点。在实际问题中，具有跳变的随机过程得到了广泛的应用。因此，这个项目的另一个目标是考虑更复杂的受控动力学，即具有跳跃的过程，并看看我们的算法是否可以解决这个更复杂的问题。Hernandez-Hernandez等人提出的鞅方法可以应用于这个项目。在本研究中，我们将主要研究该问题的理论。当理论比较完备时，我们将尝试开发相应的算法，并为其提供收敛定理。有了这些理论和算法，我们将继续这个项目的最后一个目标：将算法应用于不同的现实世界问题，比如金融交易、自动驾驶和棋盘游戏。RL在金融中的许多应用尚未纳入连续时间放松随机控制框架。据设想，在这个博士学位期间，理论进步将应用于定量金融问题（例如，投资组合交易的最佳执行和衍生证券的风险管理），重点是算法的现实世界适用性。