权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Efficient Algorithms for Sequential Decision-making

顺序决策的高效算法

基本信息

批准号：
RGPIN-2022-04816
负责人：
Vaswani, Sharan
金额：
$ 2.11万
依托单位：
Simon Fraser University
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=750581
关键词：
Efficient Algorithms Sequential Decision making

项目摘要

Machine learning allows computers to automatically detect patterns in data, and leverage it to make predictions or decisions in the real-world. In the last decade, we have witnessed an increasing number of technological and scientific fields gather large amounts of data, and use machine learning techniques for making data-driven decisions. Reinforcement learning (RL) is a subfield of machine learning that focuses on problems that involve making repeated, sequential decisions to interact with the world, collect data and reason about it, all with incomplete information about the world. Applications of such problems include clinical trials in medicine, monitoring industrial plants, robotics, and computational marketing and advertising. Though RL has found great practical success recently, typical practical algorithms are often (i) brittle, meaning that their performance is sensitive to hyper-parameters and minor design decisions, (ii) inefficient in that they require a large number of interactions to learn to make good decisions and, (iii) do not have theoretical guarantees on their performance and can fail on simple, synthetic problems. To alleviate these problems, we propose to develop statistically efficient, computationally tractable algorithms that can easily scale to large sequential decision-making problems. Throughout, our aim will be to develop algorithms that (i) are either parameter-free or robust to hyper-parameter tuning, (ii) can effectively exploit the underlying problem structure and be sample-efficient and (iii) have tight bounds on their worst-case statistical and computational performance for representative problem classes The research program will especially focus on RL problems that need to incorporate constraints while making decisions, trade off multiple conflicting objectives and reason in the presence of other cooperating or competing agents. The proposed research will help bridge the gap between the theory and practice of RL, and also contribute to the adjacent fields of machine learning and numerical optimization. Via collaborations with experts in industry and academia, we aim to use the developed techniques in healthcare, recommendation systems and computational advertising. By making progress towards the research program's objectives, we hope to expand the scope of data-driven decision-making in the real-world. Furthermore, we will provide key training to graduate students equipping them with strong mathematical foundations and programming skills necessary to solve important problems. We expect that the research program will augment Canada's existing strengths in machine learning, and help develop highly-skilled professionals valuable to Canadian institutions and companies.

机器学习允许计算机自动检测数据中的模式，并利用它在现实世界中做出预测或决策。在过去的十年中，我们目睹了越来越多的技术和科学领域收集大量数据，并使用机器学习技术来做出数据驱动的决策。强化学习（RL）是机器学习的一个子领域，它关注的问题涉及做出重复的、顺序的决策，以与世界交互，收集数据并进行推理，所有这些都是关于世界的不完整信息。这些问题的应用包括医学临床试验、监控工业工厂、机器人技术以及计算营销和广告。虽然RL最近取得了巨大的实际成功，但典型的实用算法通常是（i）脆弱的，这意味着它们的性能对超参数和次要设计决策敏感，（ii）效率低下，因为它们需要大量的交互来学习做出好的决策，（iii）对它们的性能没有理论保证，并且可能在简单的综合问题上失败。为了缓解这些问题，我们建议开发统计上有效的，计算上易于处理的算法，可以很容易地扩展到大型顺序决策问题。在整个过程中，我们的目标将是开发算法，（i）是无参数或鲁棒的超参数调整，（ii）可以有效地利用潜在的问题结构和样本效率和（iii）有严格的界限，其最坏情况下的统计和计算性能的代表性问题类的研究计划将特别关注RL问题，需要纳入约束，同时作出决定，该研究将有助于弥合强化学习理论与实践之间的差距，并对机器学习和数值优化等相关领域做出贡献。通过与工业界和学术界的专家合作，我们的目标是将开发的技术用于医疗保健，推荐系统和计算广告。通过实现研究计划的目标，我们希望扩大现实世界中数据驱动决策的范围。此外，我们将为研究生提供重点培训，使他们具备解决重要问题所需的强大数学基础和编程技能。我们预计，该研究计划将增强加拿大在机器学习方面的现有优势，并帮助培养对加拿大机构和公司有价值的高技能专业人员。