权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Curse-of-dimensionality-free nonlinear optimal feedback control with deep neural networks. A compositionality-based approach via Hamilton-Jacobi-Bellman PDEs

深度神经网络的无维数非线性最优反馈控制。

基本信息

批准号：
463912816
负责人：
Professor Dr. Lars Grüne
金额：
--
依托单位：
Lehrstuhl für Angewandte Mathematik
依托单位国家：
德国
项目类别：
Priority Programmes
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/463912816?language=en
关键词：
Curse dimensionality free nonlinear optimal

项目摘要

Optimal feedback control is one of the areas in which methods from deep learning have an enormous impact. Deep Reinforcement Learning, one of the methods for obtaining optimal feedback laws and arguably one of the most successful algorithms in artificial intelligence, stands behind the spectacular performance of artificial intelligence in games such as Chess or Go, but has also manifold applications in science, technology and economy. Mathematically, the core question behind this method is how to best represent optimal value functions, i.e., the functions that assign the optimal performance value to each state, also known as cost-to-go function in reinforcement learning, via deep neural networks (DNNs). The optimal feedback law can then be computed from these functions. In continuous time, these optimal value functions are characterised by Hamilton-Jacobi-Bellman partial differential equation (HJB PDEs), which links the question to the solution of PDEs via DNNs. As the dimension of the HJB PDE is determined by the dimension of the state of the dynamics governing the optimal control problem, HJB equations naturally form a class of high-dimensional PDEs. They are thus prone to the well-known curse of dimensionality, i.e., to the fact that the numerical effort for its solution grows exponentially in the dimension. It is known that functions with certain beneficial structures, like compositional or separable functions, can be approximated by DNNs with suitable architecture avoiding the curse of dimensionality. For HJB PDEs characterising Lyapunov functions it was recently shown by the proposer of this project that small-gain conditions - i.e., particular conditions on the dynamics of the problem - establish the existence of separable subsolutions, which can be exploited for efficiently approximating them by DNNs via training algorithms with suitable loss functions. These results pave the way for curse-of-dimensionality free DNN-based approaches for general nonlinear HJB equations, which are the goal of this project. Besides small-gain theory, there exists a large toolbox of nonlinear feedback control design techniques that lead to compositional (sub)optimal value functions. On the one hand, these methods are mathematically sound and apply to many real-world problems, but on the other hand they come with significant computational challenges when the resulting value functions or feedback laws shall be computed. In this project, we will exploit the structural insight provided these methods for establishing the existence of compositional optimal value functions or approximations thereof, but circumvent their computational complexity by using appropriate training algorithms for DNNs instead. Proceeding this way, we will characterise optimal feedback control problems for which curse-of-dimensionality-free (approximate) solutions via DNNs are possible and provide efficient network architectures and training schemes for computing these solutions.