Curse-of-dimensionality-free nonlinear optimal feedback control with deep neural networks. A compositionality-based approach via Hamilton-Jacobi-Bellman PDEs
深度神经网络的无维数非线性最优反馈控制。
基本信息
- 批准号:463912816
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:德国
- 项目类别:Priority Programmes
- 财政年份:
- 资助国家:德国
- 起止时间:
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Optimal feedback control is one of the areas in which methods from deep learning have an enormous impact. Deep Reinforcement Learning, one of the methods for obtaining optimal feedback laws and arguably one of the most successful algorithms in artificial intelligence, stands behind the spectacular performance of artificial intelligence in games such as Chess or Go, but has also manifold applications in science, technology and economy. Mathematically, the core question behind this method is how to best represent optimal value functions, i.e., the functions that assign the optimal performance value to each state, also known as cost-to-go function in reinforcement learning, via deep neural networks (DNNs). The optimal feedback law can then be computed from these functions. In continuous time, these optimal value functions are characterised by Hamilton-Jacobi-Bellman partial differential equation (HJB PDEs), which links the question to the solution of PDEs via DNNs. As the dimension of the HJB PDE is determined by the dimension of the state of the dynamics governing the optimal control problem, HJB equations naturally form a class of high-dimensional PDEs. They are thus prone to the well-known curse of dimensionality, i.e., to the fact that the numerical effort for its solution grows exponentially in the dimension. It is known that functions with certain beneficial structures, like compositional or separable functions, can be approximated by DNNs with suitable architecture avoiding the curse of dimensionality. For HJB PDEs characterising Lyapunov functions it was recently shown by the proposer of this project that small-gain conditions - i.e., particular conditions on the dynamics of the problem - establish the existence of separable subsolutions, which can be exploited for efficiently approximating them by DNNs via training algorithms with suitable loss functions. These results pave the way for curse-of-dimensionality free DNN-based approaches for general nonlinear HJB equations, which are the goal of this project. Besides small-gain theory, there exists a large toolbox of nonlinear feedback control design techniques that lead to compositional (sub)optimal value functions. On the one hand, these methods are mathematically sound and apply to many real-world problems, but on the other hand they come with significant computational challenges when the resulting value functions or feedback laws shall be computed. In this project, we will exploit the structural insight provided these methods for establishing the existence of compositional optimal value functions or approximations thereof, but circumvent their computational complexity by using appropriate training algorithms for DNNs instead. Proceeding this way, we will characterise optimal feedback control problems for which curse-of-dimensionality-free (approximate) solutions via DNNs are possible and provide efficient network architectures and training schemes for computing these solutions.
Optimal feedback control is one of the areas in which methods from deep learning have an enormous impact. Deep Reinforcement Learning, one of the methods for obtaining optimal feedback laws and arguably one of the most successful algorithms in artificial intelligence, stands behind the spectacular performance of artificial intelligence in games such as Chess or Go, but has also manifold applications in science, technology and economy. Mathematically, the core question behind this method is how to best represent optimal value functions, i.e., the functions that assign the optimal performance value to each state, also known as cost-to-go function in reinforcement learning, via deep neural networks (DNNs). The optimal feedback law can then be computed from these functions. In continuous time, these optimal value functions are characterised by Hamilton-Jacobi-Bellman partial differential equation (HJB PDEs), which links the question to the solution of PDEs via DNNs. As the dimension of the HJB PDE is determined by the dimension of the state of the dynamics governing the optimal control problem, HJB equations naturally form a class of high-dimensional PDEs. They are thus prone to the well-known curse of dimensionality, i.e., to the fact that the numerical effort for its solution grows exponentially in the dimension. It is known that functions with certain beneficial structures, like compositional or separable functions, can be approximated by DNNs with suitable architecture avoiding the curse of dimensionality. For HJB PDEs characterising Lyapunov functions it was recently shown by the proposer of this project that small-gain conditions - i.e., particular conditions on the dynamics of the problem - establish the existence of separable subsolutions, which can be exploited for efficiently approximating them by DNNs via training algorithms with suitable loss functions. These results pave the way for curse-of-dimensionality free DNN-based approaches for general nonlinear HJB equations, which are the goal of this project. Besides small-gain theory, there exists a large toolbox of nonlinear feedback control design techniques that lead to compositional (sub)optimal value functions. On the one hand, these methods are mathematically sound and apply to many real-world problems, but on the other hand they come with significant computational challenges when the resulting value functions or feedback laws shall be computed. In this project, we will exploit the structural insight provided these methods for establishing the existence of compositional optimal value functions or approximations thereof, but circumvent their computational complexity by using appropriate training algorithms for DNNs instead. Proceeding this way, we will characterise optimal feedback control problems for which curse-of-dimensionality-free (approximate) solutions via DNNs are possible and provide efficient network architectures and training schemes for computing these solutions.
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Professor Dr. Lars Grüne其他文献
Professor Dr. Lars Grüne的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Professor Dr. Lars Grüne', 18)}}的其他基金
Specialized Adaptive Algorithms for Model Predictive Control of PDEs
用于偏微分方程模型预测控制的专用自适应算法
- 批准号:
337928467 - 财政年份:2017
- 资助金额:
-- - 项目类别:
Research Grants
Model predictive PDE control for energy efficient building operation:Economic model predictive control and time varying systems
节能建筑运行的模型预测 PDE 控制:经济模型预测控制和时变系统
- 批准号:
274853298 - 财政年份:2015
- 资助金额:
-- - 项目类别:
Research Grants
Model Predictive Control for the Fokker-Planck Equation
Fokker-Planck 方程的模型预测控制
- 批准号:
264433583 - 财政年份:2014
- 资助金额:
-- - 项目类别:
Research Grants
Performance Analysis for Distributed and Multiobjective Model Predictive Control — The role of Pareto fronts, multiobjective dissipativity and multiple equilibria
分布式多目标模型预测控制的性能分析 â 帕累托前沿、多目标耗散性和多重均衡的作用
- 批准号:
244602989 - 财政年份:2013
- 资助金额:
-- - 项目类别:
Research Grants
Analyse und Entwurf ereignisbasierter Regelungen mit quantisierten Signalräumen -Vernetzte Systeme-
具有量化信号空间的基于事件的控制的分析和设计 - 网络系统 -
- 批准号:
42799909 - 财政年份:2007
- 资助金额:
-- - 项目类别:
Priority Programmes
Analysis of Random Transport in Chains using Modern Tools from Systems and Control Theory
使用系统和控制理论的现代工具分析链中的随机传输
- 批准号:
470999742 - 财政年份:
- 资助金额:
-- - 项目类别:
Research Grants
相似国自然基金
高维稀疏数据聚类研究
- 批准号:70771007
- 批准年份:2007
- 资助金额:16.0 万元
- 项目类别:面上项目
相似海外基金
Rapid Free-Breathing 3D High-Resolution MRI for Volumetric Liver Iron Quantification
用于体积肝铁定量的快速自由呼吸 3D 高分辨率 MRI
- 批准号:
10742197 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Decoding global RNP topologies in splicing regulation
解码拼接调节中的全局 RNP 拓扑
- 批准号:
10636541 - 财政年份:2023
- 资助金额:
-- - 项目类别:
Elucidating the mechanisms and consequences of MDSC-regulated immunity in TB
阐明结核病中 MDSC 调节的免疫机制和后果
- 批准号:
10548970 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia
用于识别与急性髓系白血病结果相关的基因组特征的惩罚混合治疗模型
- 批准号:
10340087 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Functional Dissection of Regulatory Myeloid Cells in Microbe-Immune Crosstalk in Skin
皮肤微生物免疫串扰中调节性骨髓细胞的功能剖析
- 批准号:
10605160 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Multi-omic Predictive Markers for Ovarian Cancer Therapy Response and Outcomes
卵巢癌治疗反应和结果的多组学预测标记
- 批准号:
10351697 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Multi-omic Predictive Markers for Ovarian Cancer Therapy Response and Outcomes
卵巢癌治疗反应和结果的多组学预测标记
- 批准号:
10678828 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Tissue-based biomarkers of anti-PD-1-based therapy in metastatic renal cell carcinoma
转移性肾细胞癌抗 PD-1 疗法的组织生物标志物
- 批准号:
10645216 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Elucidating the mechanisms and consequences of MDSC-regulated immunity in TB
阐明结核病中 MDSC 调节的免疫机制和后果
- 批准号:
10670440 - 财政年份:2022
- 资助金额:
-- - 项目类别:
Penalized mixture cure models for identifying genomic features associated with outcome in acute myeloid leukemia
用于识别与急性髓系白血病结果相关的基因组特征的惩罚混合治疗模型
- 批准号:
10544523 - 财政年份:2022
- 资助金额:
-- - 项目类别: