CAREER: Non-asymptotic, Instance-optimal Closed-loop Learning

职业：非渐近、实例最优闭环学习

基本信息

批准号：
2141511
负责人：
Kevin Jamieson
金额：
$ 50.69万
依托单位：
University of Washington
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-06-15 至 2027-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2141511&HistoricalAwards=false
关键词：
CAREER Non asymptotic Instance optimal

项目摘要

Machine Learning and Artificial Intelligence can recognize and exploit hidden patterns in data in order to predict future outcomes in applications ranging from content recommendation to personalized medicine. However, there are many problem areas where collecting the data is time-consuming (e.g., cells need to grow in lab environments) or expensive (e.g., special materials or expert opinions are required). Ideally, in order to reduce the amount of data needed to reach conclusions, already-collected data can be leveraged to guide the selection of future measurements in a closed-loop manner. While the behavior and benefits of some closed-loop data collection strategies are well understood in simple settings, this family of strategies is not commonly employed in real-world scientific laboratories or in medical trials due to a lack of predictability and accuracy of the outcomes. This project seeks to make foundational contributions to the understanding of closed-loop learning strategies with a view towards designing new data-collection strategies that are both effective and reliable. In practice, this may lead to requiring fewer patients in a clinical trial or to halving the time to identify a disease-curing drug. The investigator also plans to engage with high-school students and machine-learning enthusiasts alike to increase their level of awareness around data collection -- for instance, how even a simple survey, if not carefully designed, can result in privacy violations, demographic under-representation and bias of many forms, all of which may lead to inaccurate conclusions.For many problems of interest in closed-loop learning, prior art has focused only on minimax optimality, where the sample complexity of the worst-case problem instance is minimized. This approach leads to algorithms that are significantly inferior on "easy" or benign instances that may occur in nature but which are far from adversarial. In contrast this project will study the fundamental limits of instance-optimal sample complexity for problems of interactive learning and reinforcement learning in the Probably Approximately Correct (PAC) setting. The insights to be gained will be applied towards the design of algorithms that automatically adapt to the intrinsic difficulty of the particular problem instance being faced, be it benign or not. The proposed approach is motivated by the observation that the instance-optimal sample complexity decomposes into an asymptotic term, which is by now well characterized, and a moderate-confidence term, which is known to dominate the asymptotic term for all practical purposes. As the properties of the latter term are still poorly understood, lower bounds for it will be constructed together with algorithms that nearly achieve them. Such results will lead to algorithms that greatly reduce the overall instance-optimal sample complexity and vastly improve upon state-of-the-art algorithms that tend to cater to worst-case scenarios. The efforts will initially focus on structured linear bandits and reinforcement learning in the tabular and linear-function approximation settings. While these paradigms are of wide applicability to practice, they also have enough complexity to allow insights to be extrapolated to more generic closed-loop learning paradigms.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习和人工智能可以识别和利用数据中的隐藏模式，以预测从内容建议到个性化医学等应用程序的未来结果。但是，在许多问题领域中，收集数据是耗时的（例如，需要在实验室环境中生长）或昂贵（例如，需要特殊的材料或专家意见）。理想情况下，为了减少得出结论所需的数据量，可以利用已收集的数据来以闭环方式指导未来测量结果。尽管在简单的环境中对某些闭环数据收集策略的行为和好处有充分的理解，但由于缺乏结果的可预测性和准确性，这种策略家族并未在现实世界的科学实验室或医学试验中使用。该项目旨在为理解闭环学习策略做出基本贡献，以设计有效且可靠的新数据收集策略。实际上，这可能导致在临床试验中需要更少的患者或减少鉴定疾病康复药物的时间。 The investigator also plans to engage with high-school students and machine-learning enthusiasts alike to increase their level of awareness around data collection -- for instance, how even a simple survey, if not carefully designed, can result in privacy violations, demographic under-representation and bias of many forms, all of which may lead to inaccurate conclusions.For many problems of interest in closed-loop learning, prior art has focused only on minimax optimality, where the sample complexity最严重的问题实例最小化。这种方法导致算法在自然界中可能发生的“容易”或良性实例上显着较低，但远非对抗性。相比之下，该项目将研究实例 - 最佳样本复杂性的基本限制，以互动学习和强化学习的问题可能是正确的（PAC）设置。要获得的见解将用于设计算法的设计，这些算法会自动适应所面临的特定问题实例的内在难度，无论是否良性。提出的方法是由以下观察结果激励的，即最佳样本复杂性将其分解为渐近术语，该术语目前已经表征了良好的表征，并且中度信心项，该术语已知，该术语众所周知，该术语在所有实际目的中都占主导地位。由于后一项的特性仍然很少了解，因此将其构造的下限将与几乎实现它们的算法一起构建。这样的结果将导致算法大大降低了总体实例 - 最佳样本的复杂性，并大大改善了最新的算法，这些算法倾向于迎合最坏情况。最初，努力将集中在表格和线性功能近似设置中的结构化线性匪徒和增强学习。尽管这些范式非常适用于练习，但它们也具有足够的复杂性，可以将见解推断到更通用的闭环学习范式中。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子优点和更广泛的审查标准来通过评估来获得支持的。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Beyond No Regret: Instance-Dependent PAC Reinforcement Learning

DOI：
发表时间：
2021-08
期刊：
ArXiv
影响因子：
0
作者：
Andrew J. Wagenmaker;Max Simchowitz;Kevin G. Jamieson
通讯作者：
Andrew J. Wagenmaker;Max Simchowitz;Kevin G. Jamieson

Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design

DOI：
10.48550/arxiv.2207.02575
发表时间：
2022-07
期刊：
ArXiv
影响因子：
0
作者：
Andrew J. Wagenmaker;Kevin G. Jamieson
通讯作者：
Andrew J. Wagenmaker;Kevin G. Jamieson

Instance-optimal PAC Algorithms for Contextual Bandits

针对上下文强盗的实例最优 PAC 算法

DOI：
发表时间：
2022
期刊：
Advances in neural information processing systems
影响因子：
0
作者：
Li, Zhaoqi;Ratliff, Lillian;Nassif, Houssam;Jamieson, Kevin;Jain, Lalit
通讯作者：
Jain, Lalit

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Kevin Jamieson其他文献

Fair Active Learning in Low-Data Regimes

低数据制度下的公平主动学习

DOI：
10.48550/arxiv.2312.08559
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
Romain Camilleri;Andrew J. Wagenmaker;Jamie Morgenstern;Lalit Jain;Kevin Jamieson
通讯作者：
Kevin Jamieson

Query-Efficient Algorithms to Find the Unique Nash Equilibrium in a Two-Player Zero-Sum Matrix Game

在两人零和矩阵博弈中寻找唯一纳什均衡的高效查询算法

DOI：
10.48550/arxiv.2310.16236
发表时间：
2023
期刊：
ArXiv
影响因子：
0
作者：
Arnab Maiti;Ross Boczar;Kevin Jamieson;Lillian J. Ratliff
通讯作者：
Lillian J. Ratliff

Unbiased Identification of Broadly Appealing Content Using a Pure Exploration Infinitely-Armed Bandit Strategy

使用纯粹探索无限武装强盗策略公正地识别具有广泛吸引力的内容

DOI：
发表时间：
2023
期刊：
ACM Transactions on Recommender Systems
影响因子：
0
作者：
Maryam Aziz;J. Anderton;Kevin Jamieson;Alice Wang;Hugues Bouchard;J. Aslam
通讯作者：
J. Aslam

Optimal Exploration is no harder than Thompson Sampling

最优探索并不比汤普森采样难

DOI：
发表时间：
2023
期刊：
International Conference on Artificial Intelligence and Statistics
影响因子：
0
作者：
Zhaoqi Li;Kevin Jamieson;Lalit Jain
通讯作者：
Lalit Jain

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

利用策略和主动学习构建具有成本效益的代理奖励模型

DOI：
发表时间：
2024
期刊：
影响因子：
0
作者：
Yifang Chen;Shuohang Wang;Ziyi Yang;Hiteshi Sharma;Nikos Karampatziakis;Donghan Yu;Kevin Jamieson;Simon Shaolei Du;Yelong Shen
通讯作者：
Yelong Shen