权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Interactive reinforcement learning for adaptive experimental design

用于自适应实验设计的交互式强化学习

基本信息

批准号：
RGPIN-2020-06933
负责人：
Durand, Audrey
金额：
$ 1.75万
依托单位：
Université Laval
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2020
资助国家：
加拿大
起止时间：
2020-01-01 至 2021-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=718031
关键词：
Interactive reinforcement learning adaptive experimental

项目摘要

Unlike algorithms in supervised and unsupervised learning, which study static collections of existing data, algorithms in reinforcement learning (RL) choose actions to transform their environments and use the effects of those actions to improve their choices incrementally over time. The simplest instance of an RL setting is the bandit problem, in which an agent faces a single choice between a fixed set of actions and receives feedback from its environment within one time-step. Initially motivated by adaptive experimental designs, where one aims to compare several actions by conducting experiments in a maximally informative and efficient way, bandit algorithms have been studied intensively throughout the last decade. Beyond theoretical findings, these algorithms have also been successfully implemented in adaptive experiments, for example for optimizing cancer treatments in mice trials, for tuning microscopy imaging systems, for adjusting hyperparameters, and for designing user interfaces. However, many real-world applications are characterized by dynamics that are not captured by existing RL settings. Even worse, these dynamics sometimes contradict assumptions required by current algorithms. These cases result in algorithms with voided theoretical guarantees, at risk of producing undesirable, potentially dangerous, behaviours. Therefore, leveraging the power of interactive RL algorithms in practice requires approaches intended for these specific conditions. The proposed research program aims to bring RL to the real world of adaptive experimental design. We aim at developing algorithms that perform as expected and with theoretical guarantees that hold under these realistic environments. Through collaborations with researchers in other fields, we will deploy those strategies and measure their impact on real applications. The research program is articulated around the following objectives: 1) Propose flexible interactive RL settings that encompass state-of-the-art adaptive experimental design frameworks; 2) Investigate realistic system dynamics and introduce theoretically grounded algorithms for learning under these constraints; 3) Characterize the arising of undesirable behaviours in learning algorithms and develop strategies to guard against it; 4) Deploy algorithms in real-world applications to showcase their potential and impact the application domain. The proposed research direction has a high-impact potential as it will result in strategies developed for real application cases. Even though the program is focused on the bandit subfield, resulting knowledge and algorithms will constitute a basis for understanding and designing sequential RL algorithms, which still essentially remain confined to simulation environments. Finally, the applications will constitute proofs of concept and will result in deployment guidelines for safe and impactful RL, supporting further research that will bring RL closer to the field.

与研究现有数据静态集合的有监督和无监督学习算法不同，强化学习（RL）中的算法选择行动来改变其环境，并利用这些行动的影响随着时间的推移逐步改进其选择。RL设置最简单的例子是强盗问题，在这个问题中，代理在一组固定的行动中面临一个选择，并在一个时间步内从环境中接收反馈。最初的动机是自适应实验设计，其中一个目标是通过以最大限度地提供信息和有效的方式进行实验来比较几种行为，强盗算法在过去十年中得到了深入研究。除了理论发现之外，这些算法还成功地应用于适应性实验，例如在小鼠试验中优化癌症治疗、调整显微镜成像系统、调整超参数和设计用户界面。然而，许多现实世界的应用程序的特点是现有RL设置无法捕获的动态。更糟糕的是，这些动态有时与当前算法所要求的假设相矛盾。这些情况导致算法具有无效的理论保证，有可能产生不受欢迎的、潜在危险的行为。因此，在实践中利用交互式强化学习算法的力量需要针对这些特定条件的方法。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Durand, Audrey其他文献

An economic evaluation: Simulation of the cost-effectiveness and cost-utility of universal prevention strategies against osteoporosis-related fractures.

DOI：
10.1002/jbmr.1758
发表时间：
2013-02
期刊：
JOURNAL OF BONE AND MINERAL RESEARCH
影响因子：
6.2
作者：
Nshimyumukiza, Leon;Durand, Audrey;Gagnon, Mathieu;Douville, Xavier;Morin, Suzanne;Lindsay, Carmen;Duplantie, Julie;Gagne, Christian;Jean, Sonia;Giguere, Yves;Dodin, Sylvie;Rousseau, Francois;Reinharz, Daniel
通讯作者：
Reinharz, Daniel

Pre-trial cocaine biases choice toward cocaine through suppression of the nondrug option

DOI：
10.1016/j.pbb.2018.07.010
发表时间：
2018-10-01
期刊：
PHARMACOLOGY BIOCHEMISTRY AND BEHAVIOR
影响因子：
3.6
作者：
Freese, Luana;Durand, Audrey;Ahmed, Serge H.
通讯作者：
Ahmed, Serge H.

Informing the development of an outcome set and banks of items to measure mobility among individuals with acquired brain injury using natural language processing.

DOI：
10.1186/s12883-022-02938-1
发表时间：
2022-12-09
期刊：
BMC NEUROLOGY
影响因子：
2.6
作者：
Alhasani, Rehab;Godbout, Mathieu;Durand, Audrey;Auger, Claudine;Lamontagne, Anouk;Ahmed, Sara
通讯作者：
Ahmed, Sara

Cost-effectiveness and accuracy of prenatal Down syndrome screening strategies: should the combined test continue to be widely used?

DOI：
10.1016/j.ajog.2010.09.017
发表时间：
2011-02-01
期刊：
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY
影响因子：
9.8
作者：
Gekas, Jean;Durand, Audrey;Reinharz, Daniel
通讯作者：
Reinharz, Daniel

The Influence of Age, Sex, and Socioeconomic Status on Glycemic Control Among People With Type 1 and Type 2 Diabetes in Canada: Patient-Led Longitudinal Retrospective Cross-sectional Study With Multiple Time Points of Measurement.

DOI：
10.2196/35682
发表时间：
2023-04-27
期刊：
JMIR diabetes
影响因子：
0
作者：
Mousavi, Seyedmostafa;Tannenbaum Greenberg, Dana;Ndjaboue, Ruth;Greiver, Michelle;Drescher, Olivia;Chipenda Dansokho, Selma;Boutin, Denis;Chouinard, Jean-Marc;Dostie, Sylvie;Fenton, Robert;Greenberg, Marley;McGavock, Jonathan;Najam, Adhiyat;Rekik, Monia;Weisz, Tom;Willison, Donald J;Durand, Audrey;Witteman, Holly O
通讯作者：
Witteman, Holly O