权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CAREER: Learning with Limited Feedback - Beyond Worst-case Optimality

职业生涯：在有限的反馈下学习 - 超越最坏情况的最优性

基本信息

批准号：
1943607
负责人：
Haipeng Luo
金额：
$ 49.99万
依托单位：
University of Southern California
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2020
资助国家：
美国
起止时间：
2020-03-01 至 2025-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1943607&HistoricalAwards=false
关键词：
CAREER Learning Limited Feedback Beyond

项目摘要

Machine learning has become an integral part of many technologies deployed in our daily lives. Traditional machine learning methods work by first collecting data and then training a fixed model for future predictions. However, much more challenging scenarios emerge as machine learning is deployed in more sophisticated applications, especially those that interact with human or other agents, such as recommender systems, game playing agents, self-driving cars, and many more. One main challenge in these applications is that the learning agent often has limited feedback from the surrounding environment, and it is thus critical to learn effectively with such limited feedback. Most existing approaches are conservative and assume worst-case environments. This project focuses on understanding how to exploit specific structures exhibited in particular problem instances, with the goal of developing more adaptive and efficient learning algorithms with strong theoretical guarantees. The success of this project requires developing new algorithmic techniques and mathematical tools in a variety of disciplines. Education is integrated into this project through curriculum development, student mentoring, organizing workshops, and developing a partnership with the Montebello Unified School District to support the goal of building Computer Science pathways. The project consists of three main directions: partial monitoring, bandit optimization, and reinforcement learning. Each direction generalizes the classic multi-armed bandit problem in a different dimension: partial monitoring generalizes the feedback model; bandit optimization generalizes the decision space and objective functions; and reinforcement learning generalizes from stateless to stateful models. Each direction contains several main objectives: (1) for partial monitoring, the focus is on understanding how to adapt to data, environments, and models; (2) for bandit optimization, the focus is on developing adaptive algorithms for learning with linear, convex, and non-convex functions respectively; (3) for reinforcement learning, the focus is on investigating under what conditions learning becomes easier, and how to learn under non-stationary or even adversarial environments. In addition to theoretical developments, the project also aims at implementing all algorithms developed as open-source software and evaluating them using benchmark datasets.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

机器学习已经成为我们日常生活中部署的许多技术的组成部分。传统的机器学习方法首先收集数据，然后训练一个固定的模型来预测未来。然而，随着机器学习被部署在更复杂的应用中，特别是那些与人类或其他代理交互的应用，例如推荐系统，游戏代理，自动驾驶汽车等等，会出现更具挑战性的场景。在这些应用中的一个主要挑战是，学习代理通常具有来自周围环境的有限反馈，因此，利用这种有限反馈有效地学习是至关重要的。大多数现有的方法是保守的，并假设最坏的情况下的环境。该项目的重点是了解如何利用特定问题实例中表现出的特定结构，目标是开发具有强有力理论保证的自适应和高效学习算法。这个项目的成功需要在各种学科中开发新的算法技术和数学工具。通过课程开发，学生辅导，组织研讨会，并与蒙特贝洛联合学区建立合作伙伴关系，以支持建立计算机科学途径的目标，教育被融入这个项目。该项目包括三个主要方向：部分监控，强盗优化和强化学习。每个方向都在不同的维度上概括了经典的多臂强盗问题：部分监控概括了反馈模型;强盗优化概括了决策空间和目标函数;强化学习概括了从无状态到有状态的模型。每个方向包含几个主要目标：（1）对于部分监控，重点是了解如何适应数据，环境和模型;（2）对于强盗优化，重点是开发分别用于线性，凸和非凸函数学习的自适应算法;（3）对于强化学习，重点是研究在什么条件下学习变得更容易，以及如何在非平稳甚至对抗环境下学习。除了理论发展，该项目还旨在实现所有作为开源软件开发的算法，并使用基准数据集对其进行评估。该奖项反映了NSF的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（8）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Taking a Hint: How to Leverage Loss Predictors in Contextual Bandits?

提示：如何在上下文强盗中利用损失预测器？

DOI：
发表时间：
2020
期刊：
Conference on Learning Theory
影响因子：
0
作者：
Wei, Chen-Yu;Luo, Haipeng;Agarwal, Alekh
通讯作者：
Agarwal, Alekh

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

DOI：
发表时间：
2019-10
期刊：
影响因子：
0
作者：
Chen-Yu Wei;Mehdi Jafarnia-Jahromi;Haipeng Luo;Hiteshi Sharma;R. Jain
通讯作者：
Chen-Yu Wei;Mehdi Jafarnia-Jahromi;Haipeng Luo;Hiteshi Sharma;R. Jain

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

DOI：
发表时间：
2020-06
期刊：
ArXiv
影响因子：
0
作者：
Chung-Wei Lee;Haipeng Luo;Chen-Yu Wei;Mengxiao Zhang
通讯作者：
Chung-Wei Lee;Haipeng Luo;Chen-Yu Wei;Mengxiao Zhang

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

DOI：
发表时间：
2020-02
期刊：
ArXiv
影响因子：
0
作者：
Chung-Wei Lee;Haipeng Luo;Mengxiao Zhang
通讯作者：
Chung-Wei Lee;Haipeng Luo;Mengxiao Zhang

Can machine learning cope with the erratic and uncertain nature of the real world?

机器学习能否应对现实世界的不稳定和不确定性？

DOI：
10.33424/futurum274
发表时间：
2022
期刊：
Futurum Careers
影响因子：
0
作者：
Luo, Haipeng
通讯作者：
Luo, Haipeng

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Haipeng Luo其他文献

Towards Minimax Online Learning with Unknown Time Horizon

迈向未知时间范围的极小极大在线学习

DOI：
发表时间：
2013
期刊：
影响因子：
0
作者：
Haipeng Luo;R. Schapire
通讯作者：
R. Schapire

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

具有变化的动作集的对抗性在线学习：具有近似遗憾界限的高效算法

DOI：
发表时间：
2020
期刊：
ArXiv
影响因子：
0
作者：
E. Emamjomeh;Chen;Haipeng Luo;D. Kempe
通讯作者：
D. Kempe

Clairvoyant Regret Minimization: Equivalence with Nemirovski's Conceptual Prox Method and Extension to General Convex Games

透视遗憾最小化：与 Nemirovski 概念 Prox 方法的等价以及对一般凸博弈的扩展

DOI：
10.48550/arxiv.2208.14891
发表时间：
2022
期刊：
ArXiv
影响因子：
0
作者：
Gabriele Farina;Christian Kroer;Chung;Haipeng Luo
通讯作者：
Haipeng Luo

Efficient electro-optical tuning of an optical frequency microcomb on a monolithically integrated high-Q lithium niobate microdisk

DOI：
https://doi.org/10.1364/OL.44.005953
发表时间：
2019
期刊：
Optics Letters
影响因子：
作者：
Zhiwei Fang;Haipeng Luo;Jintian Lin;Min Wang;Jianhao Zhang;Rongbo Wu;Junxia Zhou;Wei Chu;Tao Lu;Ya Cheng
通讯作者：
Ya Cheng