权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Foundations of Reinforcement Learning under Partial Observability

职业：部分可观察性下强化学习的基础

基本信息

批准号：
2239297
负责人：
Chi Jin
金额：
$ 50万
依托单位：
Princeton University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-01 至 2028-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2239297&HistoricalAwards=false
关键词：
CAREER Foundations Reinforcement Learning under

项目摘要

A wide range of modern artificial intelligence challenges can be cast as Reinforcement Learning (RL) problems under partial observability, in which agents learn to make a sequence of decisions despite lacking complete information about the moment-to-moment situation in which decisions are made. Natural applications of this kind of Partially Observable RL (PORL) include robotics, autonomous driving, imperfect information games, resource allocation under partial information, planetary exploration, medical diagnostic systems. As such, PORL has been an important topic in operation research, control, and machine learning. While the community recently witnessed a surge of breakthroughs in reinforcement learning theory in fully observable environments, our understanding of learning to act in partially observable systems remains very limited. Partial observability brings a new series of unique challenges to RL in modeling, algorithm design, and theoretical analyses. Resolving these challenges will have far-reaching impacts in academia, industry and society where modern RL can be applied.This project aims to identify and attack these unique challenges, establish solid theoretical foundations, and design new reliable and efficient algorithms for PORL. Concretely, this proposal will study PORL in three progressive thrusts. Thrust 1 considers the basic tabular setup, under the model of Partially Observable Markov Decision Processes (POMDPs). The main objective in this thrust is to identify the key structural conditions that permit statistically or computationally efficient learning, and to address the core challenges of inferring latent states and exploration. Thrust 2 concerns modern PORL with an enormous number of states and observations, where function approximation must be deployed to approximate the models, the value functions, or the policies. We will investigate these problems under a more general model of Predictive State Representations (PSRs) and develop efficient learning results in the presence of function approximation. Thrust 3 investigates PORL in the multiagent setting, under the model of Partially Observable Markov Games (POMGs). We will design efficient algorithms for learning various equilibria in POMGs and address the unique challenges arising from multiagency and the design of decentralized algorithms.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

现代人工智能面临的一系列挑战可以被视为部分可观测性下的强化学习（RL）问题，在这种情况下，智能体学习做出一系列决策，尽管缺乏关于决策时每时每刻情况的完整信息。这种部分可观察强化学习（PORL）的自然应用包括机器人、自动驾驶、不完全信息博弈、部分信息下的资源分配、行星探索、医疗诊断系统。因此，PORL一直是运筹学、控制和机器学习中的一个重要课题。虽然社区最近在完全可观察的环境中见证了强化学习理论的突破，但我们对在部分可观察系统中学习行动的理解仍然非常有限。部分可观测性给强化学习的建模、算法设计和理论分析带来了一系列新的挑战。解决这些挑战将对现代RL应用的学术界、工业界和社会产生深远的影响。本项目旨在识别和应对这些独特的挑战，建立坚实的理论基础，并为PORL设计新的可靠和高效的算法。具体而言，本建议将研究PORL在三个渐进的推力。第一个重点是在部分可观测马尔可夫决策过程（POMDPs）模型下的基本表格设置。在这方面的主要目标是确定关键的结构条件，允许统计或计算效率的学习，并解决推断潜在状态和探索的核心挑战。推力2涉及具有大量状态和观测的现代PORL，其中必须部署函数近似来近似模型、值函数或策略。我们将在更一般的预测状态表示（PSR）模型下研究这些问题，并在函数逼近的情况下开发有效的学习结果。推力3研究PORL在多智能体设置下，部分可观察马尔可夫博弈（POMG）模型。我们将设计有效的算法来学习POMG中的各种均衡，并解决多机构和分散算法设计所带来的独特挑战。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Chi Jin其他文献

Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

与对抗性对手学习马尔可夫博弈：高效算法和基本限制

DOI：
10.48550/arxiv.2203.06803
发表时间：
2022
期刊：
Proceedings of the forty-seventh annual ACM symposium on Theory of Computing
影响因子：
0
作者：
Qinghua Liu;Yuanhao Wang;Chi Jin
通讯作者：
Chi Jin

The stability control for isolated wind‐diesel power system based on the cross coupling effect model

基于交叉耦合效应模型的离风柴油发电系统稳定控制

DOI：
10.1049/gtd2.12089
发表时间：
2020-12
期刊：
Iet Generation Transmission & Distribution
影响因子：
2.5
作者：
Yang Mi;Lang Zhongjie;Chen Xin;Yang Fu;Chi Jin;Shi Shuai;Zhao Yao;Enyu Jiang
通讯作者：
Enyu Jiang