权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interactive reinforcement learning for adaptive experimental design

用于自适应实验设计的交互式强化学习

基本信息

批准号：
RGPIN-2020-06933
负责人：
Durand, Audrey
金额：
$ 1.75万
依托单位：
Université Laval
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2022
资助国家：
加拿大
起止时间：
2022-01-01 至 2023-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=757338
关键词：
Interactive reinforcement learning adaptive experimental

项目摘要

Unlike algorithms in supervised and unsupervised learning, which study static collections of existing data, algorithms in reinforcement learning (RL) choose actions to transform their environments and use the effects of those actions to improve their choices incrementally over time. The simplest instance of an RL setting is the bandit problem, in which an agent faces a single choice between a fixed set of actions and receives feedback from its environment within one time-step. Initially motivated by adaptive experimental designs, where one aims to compare several actions by conducting experiments in a maximally informative and efficient way, bandit algorithms have been studied intensively throughout the last decade. Beyond theoretical findings, these algorithms have also been successfully implemented in adaptive experiments, for example for optimizing cancer treatments in mice trials, for tuning microscopy imaging systems, for adjusting hyperparameters, and for designing user interfaces. However, many real-world applications are characterized by dynamics that are not captured by existing RL settings. Even worse, these dynamics sometimes contradict assumptions required by current algorithms. These cases result in algorithms with voided theoretical guarantees, at risk of producing undesirable, potentially dangerous, behaviours. Therefore, leveraging the power of interactive RL algorithms in practice requires approaches intended for these specific conditions. The proposed research program aims to bring RL to the real world of adaptive experimental design. We aim at developing algorithms that perform as expected and with theoretical guarantees that hold under these realistic environments. Through collaborations with researchers in other fields, we will deploy those strategies and measure their impact on real applications. The research program is articulated around the following objectives: 1) Propose flexible interactive RL settings that encompass state-of-the-art adaptive experimental design frameworks; 2) Investigate realistic system dynamics and introduce theoretically grounded algorithms for learning under these constraints; 3) Characterize the arising of undesirable behaviours in learning algorithms and develop strategies to guard against it; 4) Deploy algorithms in real-world applications to showcase their potential and impact the application domain. The proposed research direction has a high-impact potential as it will result in strategies developed for real application cases. Even though the program is focused on the bandit subfield, resulting knowledge and algorithms will constitute a basis for understanding and designing sequential RL algorithms, which still essentially remain confined to simulation environments. Finally, the applications will constitute proofs of concept and will result in deployment guidelines for safe and impactful RL, supporting further research that will bring RL closer to the field.

与监督和非监督学习中的算法不同，强化学习中的算法研究现有数据的静态集合，强化学习中的算法选择动作来改变它们的环境，并使用这些动作的效果来随着时间的推移逐步改进它们的选择。RL设置的最简单实例是强盗问题，在该问题中，代理面临着在一组固定的操作之间进行单一选择，并在一个时间步内从其环境中接收反馈。最初的动机是适应性实验设计，目的是通过以最大限度地提供信息和有效的方式进行实验来比较几个动作，在过去的十年中，强盗算法得到了深入的研究。除了理论上的发现，这些算法还被成功地应用于适应性实验，例如，优化小鼠试验中的癌症治疗，调整显微成像系统，调整超参数，以及设计用户界面。然而，许多真实世界的应用程序具有现有RL设置无法捕获的动态特性。更糟糕的是，这些动态有时与当前算法所要求的假设相矛盾。这些情况导致算法的理论保证无效，有可能产生不受欢迎的、潜在危险的行为。因此，在实践中利用交互式RL算法的能力需要针对这些特定条件的方法。提出的研究计划旨在将RL带入适应性实验设计的现实世界。我们的目标是开发在这些现实环境下具有预期性能并具有理论保证的算法。通过与其他领域的研究人员合作，我们将部署这些战略，并衡量它们对实际应用程序的影响。该研究计划围绕以下目标展开：1)提出包含最先进的自适应实验设计框架的灵活的交互式RL设置；2)研究现实的系统动力学，并引入在这些约束下的理论基础的学习算法；3)描述学习算法中不良行为的出现并制定预防策略；4)在现实世界的应用中部署算法以展示其潜力并影响应用领域。拟议的研究方向具有很大的影响潜力，因为它将导致为实际应用案例制定战略。即使该程序专注于强盗子领域，所产生的知识和算法将构成理解和设计连续RL算法的基础，这些算法基本上仍然局限于模拟环境。最后，这些应用将构成概念的证明，并将导致安全和有效的RL的部署指导方针，支持进一步的研究，使RL更接近现场。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Durand, Audrey其他文献

An economic evaluation: Simulation of the cost-effectiveness and cost-utility of universal prevention strategies against osteoporosis-related fractures.

DOI：
10.1002/jbmr.1758
发表时间：
2013-02
期刊：
JOURNAL OF BONE AND MINERAL RESEARCH
影响因子：
6.2
作者：
Nshimyumukiza, Leon;Durand, Audrey;Gagnon, Mathieu;Douville, Xavier;Morin, Suzanne;Lindsay, Carmen;Duplantie, Julie;Gagne, Christian;Jean, Sonia;Giguere, Yves;Dodin, Sylvie;Rousseau, Francois;Reinharz, Daniel
通讯作者：
Reinharz, Daniel

Pre-trial cocaine biases choice toward cocaine through suppression of the nondrug option

DOI：
10.1016/j.pbb.2018.07.010
发表时间：
2018-10-01
期刊：
PHARMACOLOGY BIOCHEMISTRY AND BEHAVIOR
影响因子：
3.6
作者：
Freese, Luana;Durand, Audrey;Ahmed, Serge H.
通讯作者：
Ahmed, Serge H.

Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processing.

DOI：
10.1186/s12883-022-02938-1
发表时间：
2022-12-09
期刊：
BMC NEUROLOGY
影响因子：
2.6
作者：
Alhasani, Rehab;Godbout, Mathieu;Durand, Audrey;Auger, Claudine;Lamontagne, Anouk;Ahmed, Sara
通讯作者：
Ahmed, Sara

Cost-effectiveness and accuracy of prenatal Down syndrome screening strategies: should the combined test continue to be widely used?

DOI：
10.1016/j.ajog.2010.09.017
发表时间：
2011-02-01
期刊：
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY
影响因子：
9.8
作者：
Gekas, Jean;Durand, Audrey;Reinharz, Daniel
通讯作者：
Reinharz, Daniel

The Influence of Age, Sex, and Socioeconomic Status on Glycemic Control Among People With Type 1 and Type 2 Diabetes in Canada: Patient-Led Longitudinal Retrospective Cross-sectional Study With Multiple Time Points of Measurement.

DOI：
10.2196/35682
发表时间：
2023-04-27
期刊：
JMIR diabetes
影响因子：
0
作者：
Mousavi, Seyedmostafa;Tannenbaum Greenberg, Dana;Ndjaboue, Ruth;Greiver, Michelle;Drescher, Olivia;Chipenda Dansokho, Selma;Boutin, Denis;Chouinard, Jean-Marc;Dostie, Sylvie;Fenton, Robert;Greenberg, Marley;McGavock, Jonathan;Najam, Adhiyat;Rekik, Monia;Weisz, Tom;Willison, Donald J;Durand, Audrey;Witteman, Holly O
通讯作者：
Witteman, Holly O